What are the responsibilities and job description for the Data Engineer/Fabric Developer position at New York Technology Partners?
Fabric Developer / Sr. Data Engineer
Job Title: Fabric Developer / Sr. Data Engineer
Location: Phoenix, AZ (Onsite 5 days)
Client: Cognizant/Caterpillar
Requirement: 8 years of experience; Onsite 5 days per week.
Required Skills & Qualifications
- Core Expertise: 8 years in Data Engineering and architecture.
- Microsoft Fabric Stack: Expert proficiency in Data Factory (Fabric) for orchestration and Lakehouse management (Delta tables, files).
- Spark Engineering: Advanced hands-on experience with PySpark, Spark SQL, or Scala using Fabric Notebooks and Spark Job Definitions.
- OneLake / ADLS Gen2: Deep understanding of storage structures, Delta Lake format, partitioning strategies, and shortcuts.
- Pipeline Management: Expert in building ELT/ETL pipelines, monitoring performance via Monitoring Hubs, and implementing automation.
- Visualization: Experience creating intuitive reports and dashboards using Fabric’s integrated reporting tools.
- Domain: Previous experience in the Manufacturing domain is a strong plus.
- Education: Bachelor’s degree in Computer Science, Engineering, or a relevant field.
Role Overview
The Fabric Developer / Sr. Data Engineer will lead the design and implementation of an end-to-end data ecosystem within the Microsoft Fabric platform. Based on site in Phoenix, this role is responsible for the full data lifecycle—from ingestion across diverse sources into OneLake to the creation of curated Delta tables for advanced analytics. You will work closely with Platform Architects and Report Developers to ensure high-performance data layouts, focusing on code reusability, parameterization, and automated workflows. The ideal candidate is a self-driven engineer capable of optimizing Spark jobs for both performance and cost-efficiency in a manufacturing-centric data environment.
Technical Core & Responsibilities
Data Ingestion & Lakehouse Architecture
- Orchestration: Build and maintain scalable ELT/ETL pipelines using Data Factory within Fabric, enabling efficient ingestion from multiple resources.
- Lakehouse Management: Create and manage Lakehouse structures as the primary landing and processing zones, utilizing Delta Lake formats for reliability.
- OneLake Optimization: Implement storage and retrieval strategies, including partitioning and Shortcuts, to handle large-scale datasets.
Transformation & Spark Processing
- Complex Engineering: Use PySpark and Spark Notebooks for sophisticated data cleansing, enrichment, and large-scale transformations directly on OneLake data.
- Performance Tuning: Optimize Spark Job Definitions and data layouts to ensure high performance and cost-effective processing.
- Quality & Governance: Apply data quality rules and governance principles to all pipelines and data structures to ensure curated, report-ready data.
Automation & Delivery
- Workflow Automation: Reduce manual intervention by implementing automated data processing and integration workflows.
- Collaborative Design: Partner with Report Developers to transform raw data into interactive, intuitive dashboards and usable analysis formats.
- Project Leadership: Act as a self-driven lead to drive data projects through the entire delivery lifecycle, ensuring code reusability and parameterization throughout.