What are the responsibilities and job description for the Databricks / PySpark Data Engineer position at McLaren Strategic Solutions (MSS)?
We are seeking a hands-on Data Engineer with strong Databricks and PySpark experience to build scalable data pipelines and analytics applications within a modern data platform.
This role focuses on modernizing legacy ETL and reporting systems (Teradata, Informatica, Tableau) into Databricks-native pipelines, dashboards, and Python-based data applications.
The ideal candidate operates across both data engineering and lightweight application development.
Responsibilities
- Design, develop, and maintain scalable data pipelines using PySpark on Databricks
- Build analytics data models and transformation workflows for enterprise reporting and analytics
- Migrate legacy ETL workloads from platforms such as Informatica and Teradata to Databricks
- Develop Databricks-native dashboards and analytics applications to replace traditional BI tools
- Build lightweight Python-based data applications (e.g., FastAPI) to expose and interact with data
- Integrate Databricks pipelines with APIs and application services
- Implement Slowly Changing Dimensions (SCD) and dimensional data modeling techniques
- Develop reusable data engineering frameworks and standardized pipelines
- Optimize Spark workloads for performance, scalability, and cost efficiency
- Collaborate with analytics and business teams to deliver user-facing data solutions
- Leverage AI-assisted coding tools (e.g., Copilot, ChatGPT) to improve development productivity
- Contribute to best practices for modern data engineering and analytics application development
Required Experience
- 5 years of hands-on experience building data pipelines using PySpark in production environments
- Strong experience with the Databricks platform (workspaces, clusters, Jobs & Workflows, Unity Catalog)
- Experience building analytics dashboards within Databricks (Databricks SQL)
- Proven experience designing and building scalable ETL/ELT data pipelines
- Strong Python development skills, including building REST APIs or data services
- Experience building or supporting data-driven applications (not just traditional ETL pipelines)
- Solid understanding of data modeling, including dimensional modeling and transformation patterns
- Experience using AI-assisted development tools in engineering workflows, such as Codex and Claude
- Exposure to LLM integration or an AI-powered data application
- Familiarity with cloud platforms (AWS, Azure, or GCP)
Preferred Experience
- Experience migrating from legacy platforms: Informatica → Databricks / Spark; Teradata → cloud-native data platforms; Tableau (or similar) → Databricks-native dashboards
- Experience with FastAPI or similar Python frameworks for data applications
- Exposure to CI/CD pipelines for data engineering workflows
- Understanding of microservices architecture and scalable application design
- Experience in Healthcare Payor domain (e.g., claims processing, member data, provider data, eligibility, or billing systems)