What are the responsibilities and job description for the GCP Data Engineer position at Data Capital Incorporation?
About the Role:
We are looking for a highly experienced Senior Data Engineer with strong expertise in real-time data processing and scalable data architectures. You will play a key role in designing, building, and optimizing data platforms that support analytics, reporting, and machine learning use-cases. You will work closely with cross-functional teams (Data Science, Analytics, Product) to deliver high-performance data infrastructure and tools.
Professional Experience:
- Minimum 12 years of industry experience building enterprise data solutions.
- 8 years of recent, hands-on experience with Google Cloud Platform data services.
- Proven track record of delivering productionized data platforms supporting analytics and ML.
Key Responsibilities:
- Design & Build Data Pipelines: Architect, develop, and maintain robust ETL/ELT workflows for batch and real-time data ingestion and processing using Apache Spark (PySpark/Scala) and streaming technologies.
- Real-Time Streaming: Implement and manage scalable streaming platforms using Apache Kafka (or similar messaging systems like Pub/Sub/Flink), ensuring reliable data flow with low latency.
- Optimize Data Workloads: Tune Spark jobs, streaming processes, repository schemas, and SQL queries to maximize performance, minimize cost, and ensure efficient resource utilization.
- Architect Scalable Data Systems: Build and maintain modern data architectures including data lakes, data warehouses (BigQuery), and metadata frameworks that support analytical and ML workloads.
- Data Quality & Monitoring: Implement automated data quality checks, monitoring dashboards, alerts, and self-healing workflows to maintain high-fidelity data.
- Cloud & DevOps Integration: Collaborate with Cloud and DevOps teams to deploy solutions leveraging Google Cloud Platform services, containerization (Docker), and orchestration tools (Kubernetes).
- Documentation & Best Practices: Maintain technical documentation, enforce data governance standards, and advocate for best practices in data engineering.
Qualifications:
- Programming: Strong proficiency in Python, SQL, with working knowledge of Scala or Java.
- Big Data Frameworks: Expertise in Apache Spark (Spark SQL, DataFrames, Structured Streaming).
- Streaming Technologies: Hands-on experience with Apache Kafka, Google Pub/Sub, or similar systems.
- Cloud Platforms: Solid experience with Google Cloud Platform (Google Cloud Platform) data services (BigQuery, Dataflow, Pub/Sub, Dataproc, etc.).
- Data Stores: Experience with data warehousing solutions such as BigQuery, Snowflake, Redshift, and familiarity with NoSQL databases.