What are the responsibilities and job description for the Lead Data Engineer position at Capgemini?
Job Title: Lead Data Engineer (Cloud & AI Platforms)
Location: NYC, NY
Employment type: FTE (No H1B Visa Sponsorship available)
Role Summary
We are seeking an experienced Data Architect to define, design, and govern scalable, secure, and high-performing data architectures across modern cloud data platforms. This role will lead the architectural vision for enterprise data solutions built on Databricks, Azure, Snowflake, and advanced MLOps frameworks using MLflow.
The ideal candidate combines deep technical expertise with strong architectural leadership, working closely with engineering, data science, and business stakeholders to enable analytics, AI, and machine learning initiatives. The Data Architect is also expected to be hands-on where required and actively leverage AI coding agents such as AMP and Claude Code to accelerate design validation, solution development, and engineering productivity.
Key Responsibilities
Architecture & Design
- Define end-to-end cloud data architecture for analytics, AI, and machine learning platforms
- Establish architectural standards, reference architectures, and design patterns using Databricks, Azure, and Snowflake
- Design scalable, resilient, and cost-efficient data ingestion, transformation, and serving architectures (ETL/ELT)
- Architect lakehouse and warehouse solutions enabling both batch and real-time data processing
Platform & Technology Leadership
- Lead the architectural design and implementation of Databricks-based data platforms
- Architect enterprise-grade Snowflake solutions for data warehousing and data sharing
- Define MLOps architecture using MLflow for model tracking, governance, and lifecycle management
- Ensure data architectures meet security, compliance, and governance requirements
Engineering & Delivery Support
- Provide hands-on guidance and architectural oversight to data engineering teams
- Collaborate with data scientists and ML engineers to enable AI/ML workflows
- Review and guide implementation of data pipelines, infrastructure, and CI/CD practices
- Support performance tuning, cost optimization, and scalability planning
Operational Excellence
- Establish best practices for reliability, observability, and monitoring of data platforms
- Drive automation, testing, version control, and CI/CD standards across data solutions
- Proactively identify architectural risks and propose mitigation strategies
Innovation & Productivity
- Actively use AI coding agents (AMP, Claude Code) for architecture ideation, code generation, design validation, and quality improvement
- Promote modern engineering practices and AI-assisted development across teams
Required Skills & Qualifications
Core Expertise
- Expert-level experience in Cloud Data Architecture and Engineering
- Strong hands-on and architectural expertise in:
- Databricks
- Azure cloud technology stack
- Snowflake
- MLflow / MLOps
Technical Skills
- Strong programming skills in Python and/or Scala
- Deep understanding of:
- Distributed data processing
- Cloud-native data platforms
- ETL / ELT architecture patterns
- Lakehouse, warehouse, and hybrid data architectures
- Proven experience architecting secure, scalable, and production-grade data platforms
AI & Modern Development
- Must-have: Proven, hands-on experience using AI coding agents such as AMP and Claude Code
- Ability to integrate AI-assisted development into daily architectural and engineering workflows
Soft Skills
- Strong analytical and problem-solving capabilities
- Excellent collaboration and stakeholder communication skills
- Ability to translate business and data science requirements into robust technical architectures