What are the responsibilities and job description for the Data Engineer - GCP position at CoreAi Consulting?

We are looking for a GCP Data Engineer with 5 years of experience to build scalable, reliable data pipelines across the full data lifecycle—ingestion, transformation, orchestration, and serving—using GCP services and Apache Spark.

The ideal candidate has strong big data fundamentals, hands-on PySpark experience, expertise in both batch and real-time streaming, and a focus on performance optimization and clean data modeling.

Key Responsibilities

Design, build, and maintain batch and streaming pipelines using Dataflow, Dataproc, Pub/Sub, and Cloud Composer
Develop scalable data transformations using PySpark and Spark SQL
Implement data ingestion from databases, APIs, files, and event streams
Build real-time streaming solutions with Pub/Sub and Dataflow (Apache Beam), including handling windowing, late data, and watermarking
Design event-driven architectures for real-time analytics
Orchestrate workflows using Cloud Composer (Airflow DAGs)
Optimize Spark jobs by addressing data skew, shuffle issues, memory usage, and partitioning strategies
Design and manage BigQuery schemas using dimensional modeling and Lakehouse patterns
Implement data quality, validation, and monitoring within pipelines
Collaborate with stakeholders to translate business needs into data models
Maintain documentation, runbooks, and follow Agile and CI/CD practices

Qualifications and Required Skills

5 years of experience in Data Engineering with GCP, Python, PySpark, and SQL
Strong expertise in BigQuery (advanced SQL, partitioning, clustering, cost optimization)
Experience with Dataflow (Apache Beam) for batch and streaming pipelines
Hands-on experience with Dataproc / Apache Spark (PySpark, Spark SQL, performance tuning)
Experience with Pub/Sub (event design, delivery semantics, deduplication)
Experience with Cloud Composer (Airflow) for workflow orchestration
Experience with Cloud Storage integration and lifecycle management
Strong understanding of distributed data processing concepts (partitioning, shuffling, fault tolerance)
Solid understanding of Spark internals (execution model, DAGs, Catalyst optimizer, Spark UI debugging)
Familiarity with data formats such as Parquet, ORC, Avro, and Delta Lake
Knowledge of streaming concepts (windowing, triggers, exactly-once vs at-least-once processing)
Experience with data modeling (star/snowflake schemas, SCD Type 1/2)
Experience with lakehouse architectures (Delta Lake/Iceberg with BigQuery)
Experience with incremental data loads and large-scale partitioned datasets
Experience with performance tuning (joins, partitioning, caching, file sizing)
Familiarity with CI/CD pipelines (Cloud Build or GitHub Actions)
Experience using Git for version control
Basic knowledge of Terraform for infrastructure management
Experience building Python-based APIs (e.g., FastAPI) for data services

Apply for this job

Receive alerts for other Data Engineer - GCP job openings

Data Engineer - GCP

What are the responsibilities and job description for the Data Engineer - GCP position at CoreAi Consulting?

What is the career path for a Data Engineer - GCP?

Job openings at CoreAi Consulting

Not the job you're looking for? Here are some other Data Engineer - GCP jobs in the Phoenix, AZ area that may be a better fit.

We don't have any other Data Engineer - GCP jobs in the Phoenix, AZ area right now.

AI Assistant is available now!