What are the responsibilities and job description for the Data Engineer – Data Lake Migration position at Jobs via Dice?
Dice is the leading career destination for tech experts at every stage of their careers. Our client, Value Technology Inc, is seeking the following. Apply via Dice today!
Job Title: Data Engineer – Data Lake Migration
Location: Dallas, TX (Onsite)
Experience: 8-10 years
Role Summary
Value Technology is seeking a Data Engineer to join a high-impact datastore migration initiative, focused on migrating data from on-premise Data Lakes to an AWS-based Lakehouse architecture. This role involves end-to-end data pipeline migration, transformation of legacy consumption patterns, and ensuring data quality and integrity across modern data platforms.
Key Responsibilities
Data Migration & Pipeline Engineering
Programming & Data Processing
Job Title: Data Engineer – Data Lake Migration
Location: Dallas, TX (Onsite)
Experience: 8-10 years
Role Summary
Value Technology is seeking a Data Engineer to join a high-impact datastore migration initiative, focused on migrating data from on-premise Data Lakes to an AWS-based Lakehouse architecture. This role involves end-to-end data pipeline migration, transformation of legacy consumption patterns, and ensuring data quality and integrity across modern data platforms.
Key Responsibilities
Data Migration & Pipeline Engineering
- Refactor and migrate data pipelines, extraction logic, and job scheduling from legacy systems to modern Lakehouse architecture.
- Execute large-scale data transfers ensuring accuracy, completeness, and consistency.
- Work with file formats such as JSON, Avro, and Parquet for efficient data processing.
- Convert and optimize legacy SQL and Apache Spark-based workloads for modern platforms.
- Migrate and adapt datasets to Snowflake and Apache Iceberg environments.
- Analyze existing data usage patterns to design optimized data delivery solutions.
- Perform data validation and reconciliation to ensure migrated data matches production standards.
- Build and utilize reconciliation frameworks to validate data correctness and completeness.
- Act as a technical liaison between engineering teams and business stakeholders.
- Drive data hand-off and sign-off processes, ensuring alignment with business requirements.
- Provide regular updates and participate in stakeholder discussions.
- Collaborate with internal data platform teams to adopt new tools, workflows, and frameworks.
- Work with distributed systems and data storage frameworks such as Hadoop (HDFS/Hive).
Programming & Data Processing
- Strong proficiency in Python or Java
- Hands-on experience with Apache Spark
- Strong expertise in ANSI SQL
- Experience with:
- Snowflake
- Apache Iceberg
- Hadoop (HDFS, Hive)
- Kafka (data streaming)
- Sybase IQ
- Knowledge of JSON, Avro, Parquet
- Experience with data ingestion mechanisms (e.g., FTP)
- Strong understanding of SDLC and CI/CD practices
- Experience with Kubernetes (K8s) deployments
- Temporal Data Modeling: Handling historical data and SCD (e.g., SCD Type 2)
- Schema Management: Schema evolution and enforcement (especially with Iceberg)
- Performance Optimization: Partitioning, clustering, and query tuning
- Data Architecture:
- Normalization vs. Denormalization
- Natural vs. Surrogate Keys
- Strong troubleshooting and debugging skills (SQL & pipelines)
- Ability to quickly learn new tools, frameworks, and workflows
- Experience working with large-scale, distributed data systems