What are the responsibilities and job description for the Sr Data Engineer position at Infinity Tech Group Inc?
Our client is composed of data scientists, AI engineers, and software engineers, drives innovation by developing advanced AI and data science solutions that enhance decision-making across our financial advisory and asset management business lines. This team ensures our client stays at the forefront of a data-driven world, delivering insights that support client engagements and strengthen key partnerships, while keeping the firm competitive and efficient in an evolving financial landscape.
As a Data Engineer, you'll lead efforts to onboard and model datasets on modern cloud data platforms, delivering reliable pipelines and high-quality data layers that serve analytics, reporting, and ML/AI workloads.
Responsibilities:
- Ingest and model data from APIs, files/SFTP, and relational sources; implement layered architectures (raw/clean/serving) using PySpark/SQL and dbt, Python.
- Design and operate pipelines with Prefect (or Airflow), including scheduling, retries, parameterization, SLAs, and welldocumented runbooks.
- Build on cloud data platforms, leveraging S3/ADLSS for storage and a Spark platform (e.g., Databricks or equivalent) for compute; manage jobs, secrets, and access.
- Publish governed data services and manage their lifecycle with Azure API Management (APIM) authentication/authorization, policies, versioning, quotas, and monitoring.
- Enforce data quality and governance through data contracts, validations/tests, lineage, observability, and proactive alerting.
- Optimize performance and cost via partitioning, clustering, query tuning, job sizing, and workload management.
- Uphold security and compliance (e.g., PII handling, encryption, masking) in line with firm standards.
- Collaborate with stakeholders (analytics, AI engineering, and business teams) to translate requirements into reliable, productionready datasets.
- Enable AI/LLM use cases by packaging datasets and metadata for downstream consumption, integrating via Model Context Protocol (MCP) where appropriate.
- Continuously improve platform reliability and developer productivity by automating routine tasks, reducing technical debt, and maintaining clear documentation.
Requirements:
- Bachelor's or advanced degree in Computer Science, Data Engineering, or a related field.
- 15 years of professional data engineering experience.
- Strong Python, SQL, and Spark (PySpark) skills, and/or Kafka.
- Snowflake (Snowpipe, Tasks, Streams) as a complementary warehouse.
- Databricks (Delta formats, workflows, cataloging) or equivalent Spark platforms.
- Hands-on experience building ETL/ELT with Prefect (or Airflow), dbt, Spark, and/or Kafka.
- Experience onboarding datasets to cloud data platforms (storage, compute, security, governance).
- Familiarity with Azure/AWS/Google Cloud Platform data services (e.g., S3/ADLSS; Redshift/BigQuery; Glue/ADF).
- Git-based workflows CI/CD and containerization with Docker (Kubernetes a plus).
Desired:
- Advanced APIM practices (custom policies, OAuth2/JWT, mTLS, private endpoints) and Azure AD integration.
- Integrating datasets into MCP tools/providers for LLM/agent applications; familiarity with frameworks such as LangChain or LlamaIndex.
- Data observability/quality tools (e.g., Great Expectations, Monte Carlo, Datafold) and strong lineage practices.
- Exposure to financial datasets and controls (PII handling, encryption, masking).