What are the responsibilities and job description for the Big Data Engineer position at Purple Drive Technologies LLC?
Job Details
Job Title: Big Data Engineer
Location: Atlanta, GA / Tampa, FL/ Dallas, TX
Job Summary
We are seeking an experienced Big Data Engineer to design, build, and optimize large-scale data pipelines and distributed systems across cloud and on-prem platforms. The ideal candidate will have strong expertise in Spark, Hadoop ecosystems, cloud data services, ETL/ELT design, streaming platforms, and best practices for scalable data processing.
Responsibilities
-
Design, develop, and maintain big data pipelines using Spark (PySpark/Scala), Hadoop, Kafka, and distributed computing frameworks.
-
Build and optimize ETL/ELT pipelines for structured and unstructured data across cloud and on-prem data platforms.
-
Work with cloud technologies (AWS, Azure, or Google Cloud Platform) including Data Lake, Databricks, EMR, Glue, Dataflow, Synapse, or Snowflake.
-
Develop Delta Lake / Lakehouse architecture for high-performance ingestion and processing.
-
Build real-time streaming solutions using Kafka, Spark Streaming, Kinesis, or Event Hub.
-
Collaborate with data architects, analysts, and application teams to gather data requirements and deliver scalable solutions.
-
Implement and maintain CI/CD pipelines, automated jobs, and orchestration using Airflow/Azure Data Factory/Glue Workflows.
-
Optimize data pipelines for performance, cost efficiency, and reliability.
-
Ensure data quality, validation, governance, and lineage best practices across the ecosystem.
-
Troubleshoot and resolve production issues in a high-availability environment.
Required Skills & Qualifications
-
5-8 years of experience as a Big Data Engineer or Data Engineer.
-
Strong hands-on skills with Spark (PySpark or Scala), Hadoop, Hive, HDFS, MapReduce.
-
Experience working with Databricks, EMR, Glue, Synapse, DataProc, or equivalent big-data compute engines.
-
Proficiency in Python or Scala for data engineering.
-
Experience with Kafka or other event-streaming technologies.
-
Strong understanding of cloud data architectures (AWS S3/Glue/EMR | Azure ADLS/ADF/Databricks | Google Cloud Platform BigQuery/DataProc).
-
Solid SQL skills and experience with relational NoSQL databases.
-
Experience with version control (Git) and CI/CD tools (Jenkins, Azure DevOps, GitHub Actions).
-
Hands-on experience with Airflow, ADF, or other orchestration and scheduling tools.
-
Familiarity with data modeling, data governance, and best practices for data quality.