What are the responsibilities and job description for the Data Engineer (AWS Glue, Airflow, Lambda, Postgres) (contract) position at Capgemini?
Data Engineer (AWS Glue, Airflow, Lambda, Postgres)
Role Overview
We are seeking a skilled Data Engineer with strong hands-on experience in AWS-native data services, particularly Glue, Lambda, EventBridge, and workflow orchestration using Airflow. The role focuses on building scalable, event-driven data pipelines and supporting data processing and analytics on AWS and Postgres platforms.
Key Responsibilities
Role Overview
We are seeking a skilled Data Engineer with strong hands-on experience in AWS-native data services, particularly Glue, Lambda, EventBridge, and workflow orchestration using Airflow. The role focuses on building scalable, event-driven data pipelines and supporting data processing and analytics on AWS and Postgres platforms.
Key Responsibilities
- Design and develop scalable data pipelines using AWS Glue, Lambda, and EventBridge for event-driven and batch processing.
- Build and maintain AWS Glue ETL jobs (PySpark/Python) for data ingestion, transformation, and curation across data lake layers.
- Develop and manage Airflow DAGs (MWAA or self-managed) to orchestrate data workflows, triggers, and dependencies across AWS services.
- Implement event-driven architectures using AWS Lambda and EventBridge for near real-time data processing.
- Write, optimize, and maintain complex SQL queries in Postgres for data validation, transformation, and reporting.
- Manage metadata and schema definitions using AWS Glue Data Catalog, ensuring proper governance and discoverability.
- Build and support robust data ingestion frameworks for batch and near real-time data pipelines.
- Monitor and troubleshoot pipeline performance using AWS monitoring tools (CloudWatch, logs, alerts).
- Collaborate with downstream teams (BI, analytics, and Snowflake if applicable) to ensure reliable data consumption.
- Contribute to system reliability, scalability, and performance optimization of the data platform.
- 6 years of experience in Data Engineering or a related field.
- Strong hands-on expertise in AWS Glue, Lambda, EventBridge, and S3.
- Experience building and orchestrating workflows using Apache Airflow (MWAA preferred).
- Strong proficiency in SQL (Postgres preferred) for complex transformations and performance tuning.
- Experience with Python / PySpark for ETL development.
- Solid understanding of event-driven data architecture and pipeline design.
- Familiarity with AWS IAM, CloudWatch, and logging/monitoring frameworks.
- Experience with CI/CD pipelines (e.g., GitHub Actions) for deployment automation.
- Experience with Terraform or infrastructure-as-code tools.
- Knowledge of data lake formats (Parquet, Iceberg, Delta).
- Understanding of data modeling, partitioning, and schema evolution.
- Familiarity with AWS services such as Lake Formation, SNS, and CloudTrail.
- Exposure to Snowflake or other data warehouse platforms is a plus.
Salary : $40 - $62