What are the responsibilities and job description for the Data Engineer with Data Bricks position at Jobs via Dice?
Dice is the leading career destination for tech experts at every stage of their careers. Our client, Apton Inc, is seeking the following. Apply via Dice today!
ob Description: -
Design, build, and optimize robust, scalable pipelines in Databricks (PySpark, SQL, Delta Lake) for structured, semi-structured, and unstructured data.
Ingest and process data from diverse sources: relational databases, APIs, PDFs, Excel, flat files, and web scraping.
Implement data quality frameworks, validation checks, and automated tests to ensure reliability across the pipeline lifecycle.
Conduct UAT/UAD processes to align outputs with business requirements and ensure trust in data.
Own pipelines end-to-end—from development through deployment, monitoring, scaling, and continuous improvement.
Apply best practices for dataset versioning, reproducibility, and lineage (Delta Lake, MLflow, or equivalent).
Optimize pipelines for large-scale performance and cost-efficiency in Databricks and AWS environments.
Collaborate with data science and analytics teams to deliver model-ready datasets and ensure schema stability.
Produce clear technical documentation (design docs, lineage diagrams, data dictionaries) and communicate effectively with technical and business stakeholders.
Required Qualifications
6 years of professional experience in Data Engineering or related roles.
Hands-on expertise in Databricks (PySpark, SQL, Delta Lake).
Proven track record of building and deploying end-to-end production pipelines at scale.
Experience working with structured and unstructured data, including extracting information from PDFs, Excel, web scraping, and APIs.
Strong foundation in data quality, testing, and observability (e.g., Great Expectations, Deequ, dbt tests, or similar).
Ability to troubleshoot, debug, and optimize Spark/Databricks workloads for scale and performance.
ob Description: -
Design, build, and optimize robust, scalable pipelines in Databricks (PySpark, SQL, Delta Lake) for structured, semi-structured, and unstructured data.
Ingest and process data from diverse sources: relational databases, APIs, PDFs, Excel, flat files, and web scraping.
Implement data quality frameworks, validation checks, and automated tests to ensure reliability across the pipeline lifecycle.
Conduct UAT/UAD processes to align outputs with business requirements and ensure trust in data.
Own pipelines end-to-end—from development through deployment, monitoring, scaling, and continuous improvement.
Apply best practices for dataset versioning, reproducibility, and lineage (Delta Lake, MLflow, or equivalent).
Optimize pipelines for large-scale performance and cost-efficiency in Databricks and AWS environments.
Collaborate with data science and analytics teams to deliver model-ready datasets and ensure schema stability.
Produce clear technical documentation (design docs, lineage diagrams, data dictionaries) and communicate effectively with technical and business stakeholders.
Required Qualifications
6 years of professional experience in Data Engineering or related roles.
Hands-on expertise in Databricks (PySpark, SQL, Delta Lake).
Proven track record of building and deploying end-to-end production pipelines at scale.
Experience working with structured and unstructured data, including extracting information from PDFs, Excel, web scraping, and APIs.
Strong foundation in data quality, testing, and observability (e.g., Great Expectations, Deequ, dbt tests, or similar).
Ability to troubleshoot, debug, and optimize Spark/Databricks workloads for scale and performance.