What are the responsibilities and job description for the Sr. AI Data Engineer position at Visionary Innovative Technology Solutions LLC?
Job Title: Sr. AI Data Engineer w/d Scala, Spark and Python experience.
Location: Jersey City, NJ β 5 Days onsite role
Long term project
Job Description: An AI Data Engineer combines traditional data engineering expertise with machine learning (ML) and artificial intelligence (AI) requirements. They build, manage, and optimize the data pipelines and infrastructure necessary to train and deploy AI models, ensuring high-quality, scalable data access for AI workflows.
Key Responsibilities
- Data Pipeline Development (ETL/ELT): Build and maintain data pipelines for batch and real-time streaming data, ensuring reliable ingestion and transformation from sources like databases and APIs.
- AI/LLM Data Preparation: Construct pipelines specifically for AI/GenAI models, including data extraction, cleaning, chunking, embedding, and grounding to prepare data for Retrieval-Augmented Generation (RAG).
- Model Training Support: Collaborate with data scientists to automate data flow for feature engineering, model training, and retraining systems.
- Infrastructure Management: Design, build, and optimize scalable data infrastructure (Lakehouse/Data Warehouse) using cloud platforms like AWS, Azure, or GCP.
- MLOps & Deployment: Integrate data pipelines into MLOps processes to monitor model performance, detect data drift, and deploy models as APIs.
- Data Governance & Security: Implement data quality, security, and governance standards across all AI-related datasets.
Required Skills & Qualifications
- Programming Languages: Strong proficiency in Python (primary), SQL, and Scala or Java.
- Data Engineering Tools: Expertise in Apache Spark, Kafka, Airflow, or Databricks.
- Cloud Platforms: Hands-on experience with cloud AI stacks (Azure Synapse/Fabric, AWS Bedrock, Google Vertex AI).
- AI/ML Knowledge: Familiarity with machine learning frameworks (TensorFlow, PyTorch, Scikit-learn) and vector databases (Pinecone, Milvus, FAISS).
- Data Modeling: Strong understanding of data modeling, including relational, NoSQL, and star/snowflake schemas.