What are the responsibilities and job description for the Senior Cloud Data Engineer position at Capgemini?
Role: Senior Cloud Data Engineer
Location, NYC, NY (In-Person Interview Mandatory)
Employment type: FTE
Overview
We are seeking a Sr. Cloud Data Engineer to design, build, and maintain scalable data architectures that support the development, deployment, and governance of AI solutions for the Model Risk Management (MRM) function.
The role focuses on modern data engineering in cloud environments to enable model validation, monitoring, reporting, documentation automation, and overall AI-driven MRM capabilities.
The ideal candidate has strong cloud, data engineering, and automation skills, along with an understanding of model risk concepts and regulatory expectations.
Key Responsibilities
1. Cloud Data Engineering
- Architect and develop secure, scalable cloud-based data pipelines to support MRM AI initiatives.
- Build ETL/ELT workflows to ingest data from:
- Model inventory platforms
- Model validation repositories
- Risk systems
- Documentation and unstructured repositories
- Optimize data storage, compute, and access layers across cloud-native services.
4. Collaboration & Stakeholder Engagement
- Work closely with MRM stakeholders—Model Validation, Model Governance, Internal Audit, Compliance—to understand data needs.
- Collaborate with AI/ML Engineers and Data Scientists to support end‑to‑end model lifecycle workflows.
- Work with technology teams to align with enterprise cloud, security, and data governance frameworks.
5. Automation & Process Enhancement
- Automate manual MRM tasks using cloud-native tools, Python, and workflow orchestrators.
- Support deployment of GenAI-powered applications such as:
- Document summarization
- Risk narrative generation
- Model documentation automation
- Intelligent search on MRM repositories
Required Qualifications
Technical Skills
- Strong experience in cloud platforms (Azure preferred; AWS / GCP also acceptable):
- Azure Data Factory / Synapse / Databricks
- AWS Glue / EMR / Redshift
- GCP Dataflow / BigQuery
- Deep knowledge of Python, SQL, and distributed data processing (Spark / PySpark).
- Experience with:
- Data lake and warehouse architectures
- Streaming & batch ingestion
- Workflow orchestration (Airflow, ADF, Step Functions)
- Containerization (Docker, Kubernetes)
Domain Knowledge
- Foundational understanding of Model Risk Management concepts, including:
- Model inventory
- Model validation
- Governance workflows
- Regulatory expectations for model risk
- Experience working in banking, financial services, or other regulated industries preferred.
Soft Skills
- Strong communication and documentation abilities.
- Ability to work with cross-functional risk and technology teams.
- Analytical mindset and strong problem‑solving skills.