What are the responsibilities and job description for the Google Cloud Platform Data Engineer position at Data Capital Inc?
- Key Responsibilities:
- Design, develop, and maintain scalable ETL/ELT pipelines for batch and real-time data processing.
- Build and optimize data solutions using Apache Spark (PySpark) and streaming technologies.
- Develop and manage data lakes, data warehouses, and cloud-based data platforms on Google Cloud Platform.
- Optimize Spark jobs, SQL queries, and data workflows for performance and cost efficiency.
- Implement data quality, monitoring, and alerting frameworks.
Required Skills:
- Strong experience in Big Data technologies, especially Apache Spark and PySpark.
- Expertise in ETL processes and workflow orchestration using Apache Airflow.
- Strong programming skills in Python and SQL.
- 4 years of hands-on experience with Google Cloud Platform services (e.g., BigQuery, Pub/Sub, Dataflow).
- Experience with real-time data streaming and scalable data architectures.
Nice to Have:
- Scala/Java knowledge.
- Experience with Kafka, Databricks, Docker, Kubernetes, Snowflake, or NoSQL databases.