What are the responsibilities and job description for the Data Engineer position at Avance Consulting?

Data Engineer (Hadoop & On-Premise Cloud)

Charlotte, North Carolina | Plano, Texas (Onsite)

Permanent/Full Time

Required Skill and Experience (Must-Have)

End-to-End Implementation: Proven experience in end-to-end project implementation using Hadoop (HDFS, Hive, HBase), PySpark, and On-Premise Cloud/Private Cloud infrastructures (e.g., OpenStack, VMware, or bare-metal clusters).

Core Hadoop Ecosystem: Strong knowledge and hands-on experience in HDFS, Hive, YARN, MapReduce, and HBase.

PySpark Development: Hands-on experience in designing and developing ingestion pipelines, ETL pipelines, and batch/streaming jobs using PySpark (must have, Scala-Spark is a plus but not required).

On-Premise Cluster Management: Experience in on-premise cluster configuration, resource management (CPU/Memory), Namenode/Datanode administration, and job scheduling using Apache Airflow, Oozie, or Control-M.

Performance Tuning: Proven ability in handling large datasets (TB/PB scale) and applying performance optimization techniques for data ingestion and retrieval in on-premise Hadoop environments (e.g., partitioning, bucketing, file format selection like Parquet/ORC).

Preferred Skill and Experience (Good-to-Have)

Software Engineering Best Practices: Knowledge of design patterns, data structures, algorithms, collections, multi-threading, memory management, and concurrency in Python.

Data Warehousing & SQL: Strong SQL skills for Hive/Impala queries and experience with traditional on-premise data warehouses.

Migration Experience: Experience migrating data from legacy on-premise systems (e.g., Oracle, Teradata, Netezza) to modern Hadoop-based data lakes on private cloud.

Workflow Management: Hands-on experience with workflow orchestration tools like Apache Airflow, NiFi, or Oozie for complex dependency management.

Visualization: Exposure to Power BI, Tableau, or QlikView connecting to Hadoop/Hive.

Agile Methodologies: Understanding of Scrum, Kanban, or other Agile frameworks.

Domain Knowledge: Experience in Banking, Financial Services, or Insurance (BFSI) domain, including familiarity with regulatory reporting, data governance, or compliance (GDPR, BCBS, etc.).

Team Collaboration: Ability to work effectively in a diverse, multi-stakeholder environment comprising Business users, Data Scientists, and IT infrastructure teams.

Key Responsibilities (To be added as needed)

Design, build, and maintain scalable data pipelines using PySpark on on-premise Hadoop clusters.

Manage and optimize Hive metastore, HDFS storage, and YARN resource queues for multi-tenant workloads.

Implement data validation, error handling, and reconciliation mechanisms for batch and real-time data.

Collaborate with infrastructure teams to tune on-premise cloud resources (compute, storage, network) for Spark workloads.

Migrate existing ETL workflows from legacy systems to Hadoop ecosystem.

Salary : $100,000

Apply for this job

Receive alerts for other Data Engineer job openings

Data Engineer

What are the responsibilities and job description for the Data Engineer position at Avance Consulting?

Job openings at Avance Consulting

Not the job you're looking for? Here are some other Data Engineer jobs in the Charlotte, NC area that may be a better fit.

We don't have any other Data Engineer jobs in the Charlotte, NC area right now.

AI Assistant is available now!