What are the responsibilities and job description for the Hadoop Developer position at Covetus?
Title: Senior Hadoop Developer
Location: Charlotte NC
Experience: Minimum 10 years
Primary skills: PySpark, Apache Kafka, Hadoop Ecosystem, Hive, Databricks Lakehouse Architecture, Delta Lake, Bronze/Silver/Gold Data Modeling, Big Data ETL Pipeline Development, SQL, Real-time Data Ingestion Frameworks, Data Governance & Cataloging, CI/CD Tools – Git, Jenkins, Bitbucket, Workflow Orchestration, and Cloud & On-Prem Big Data Platforms.
Roles & Responsibilities
Seeking a Senior Big Data Engineer with 10–13 years of experience specializing in Hadoop, PySpark, Kafka, Hive, and strong experience designing data solutions for large-scale financial systems.
In addition, the candidate must possess advanced expertise in Databricks Lakehouse architecture, particularly around Bronze/Silver/Gold layer data modeling, Delta Lake optimizations, and building reliable, scalable pipelines for regulatory, risk, trading, and analytics workloads.
This role focuses on delivering highly performant, well-governed data platforms that support the bank’s mission-critical global markets functions.
Key Responsibilities:
Big Data Platform Engineering
- Design, develop, and optimize PySpark-based ETL pipelines running on on‑prem Hadoop clusters and cloud environments.
- Build high‑volume ingestion frameworks using Kafka for real-time and near-real-time trading and market data.
- Develop, tune, and manage Hadoop ecosystem components—HDFS, YARN, MapReduce, Tez, Oozie/Airflow.
- Build high-performance, optimized Hive data models for regulatory reporting, trade lifecycle, and market risk processing.
Databricks Lakehouse & Delta Framework
- Architect and implement Bronze/Silver/Gold layer modeling patterns within the Databricks Lakehouse.
- Apply Delta Lake best practices including:
- optimized file management
- Z-Ordering
- Delta Change Data Feed (CDF) o schema evolution & enforcement o ACID transaction handling
- Build reusable frameworks for ingestion, cleansing, transformation, and consumption of data across Lakehouse layers.
- Enable governance, lineage, and auditability using Unity Catalog or equivalent cataloging tools.
Collaboration, Leadership & Delivery
- Collaborate closely with quants, product owners, architects, risk tech, and business users.
- Participate in agile ceremonies — sprint planning, refinement, design reviews.
- Mentor junior engineers and contribute to building strong engineering practices across tech teams.
Required Skills & Experience
- 10–13 years of hands-on experience in Big Data engineering.
- Expert skills in:
- PySpark — dataframe optimizations, partitioning, broadcast strategies, distributed computing.
- Kafka — producer/consumer design, schema registry, streaming ETLs.
- Hadoop ecosystem — HDFS, YARN, MapReduce/Tez, Oozie/Airflow.
- Hive — advanced query tuning, TEZ optimization, partition/bucket management.
- Extensive hands-on experience with Databricks Lakehouse, including:
- Bronze/Silver/Gold layer modeling
- Delta Lake optimizations
- Data quality frameworks on Lakehouse
- Structured & unstructured data handling
- Experience in Global Markets, Risk, Treasury, Trade Surveillance, or Regulatory Reporting.
- Strong SQL knowledge with experience working on massive datasets (TB/PB scale).
- Experience with CI/CD practices — Git, Jenkins, Bitbucket, build pipelines.