What are the responsibilities and job description for the Data Engineer - AI Systems position at Delta System & Software, Inc.?
Hi ,
Hope you ae doing good ,
Position : Data Engineer - AI Systems
Duration : 6 months (W2 Contract Only)
Location : St. Louis, MO (Onsite)
Primary Skills: Data Engineer, Databricks, Python, PySpark, AI/ML
Job Description:
We’rebuilding intelligent, Databricks-powered AI systems that structure and activate information from diverse enterprise sources (Confluence, OneDrive, PDFs, and more). As a Data Engineer, you’ll design and optimize the data pipelines that transform raw and unstructured content into clean, AI-ready datasets for machine learning and generative AI agents.
You’ll collaborate with a cross-functional team of Machine Learning Engineers, Software Developers, and domain experts to create high-quality data foundations that power Databricks-native AI agents and retrieval systems.
Key Responsibilities
- Develop Scalable Pipelines: Design, build, and maintain high-performance ETL and ELT workflows using Databricks, PySpark, and Delta Lake.
- Data Integration: Build APIs and connectors to ingest data from collaboration platforms such as Confluence, OneDrive, and other enterprise systems.
- Unstructured Data Handling: Implement extraction and transformation pipelines for text, PDFs, and scanned documents using Databricks OCR and related tools.
- Data Modelling: Design Delta Lake and Unity Catalog data models for both structured and vectorized (embedding-based) data stores.
- Data Quality & Observability: Apply validation, version control, and quality checks to ensure pipeline reliability and data accuracy.
- Collaboration: Work closely with ML Engineers to prepare datasets for LLM fine-tuning and vector database creation, and with Software Engineers to deliver end-to-end data services.
- Performance & Automation: Optimize workflows for scale and automation, leveraging Databricks Jobs, Workflows, and CI/CD best practices.
What You Bring
- Experience with data engineering, ETL development, or data pipeline automation.
- Proficiency in Python, SQL, and PySpark.
- Hands-on experience with Databricks, Spark, and Delta Lake.
- Familiarity with data APIs, JSON, and unstructured data processing (OCR, text extraction).
- Understanding of data versioning, schema evolution, and data lineage concepts.
- Interest in AI/ML data pipelines, vector databases, and intelligent data systems.
Bonus Skills
- Experience with vector databases (e.g., Pinecone, Chroma, FAISS) or Databricks’ Vector Search.
- Exposure to LLM-based architectures, LangChain, or Databricks Mosaic AI.
- Knowledge of data governance frameworks, Unity Catalog, or access control best practices.
- Familiarity with REST API development or data synchronization services (e.g., Airbyte, Fivetran, custom connectors).