What are the responsibilities and job description for the Senior Data Engineer position at SoTalent?
Must be a US Citizen due to contractual requirements.
Role Overview
Our client is seeking an experienced Senior Data Engineer to design and deliver advanced data systems that support a wide range of artificial intelligence applications. This role focuses on building scalable data solutions that enable better insights, strengthen decision-making, and power modern AI capabilities across the organisation.
You will work closely with leadership in AI and collaborate with cross-functional teams, contributing to the development of systems that support traditional machine learning, generative AI, and emerging autonomous (agent-based) technologies.
The position follows a hybrid working model based in either Phoenix, Arizona or Charlotte, North Carolina.
Key Responsibilities
- Develop and manage data infrastructure supporting various AI use cases, including predictive models, generative systems, and autonomous workflows
- Build scalable pipelines to process diverse data formats such as structured datasets, text, images, audio, video, and system logs
- Create feature engineering workflows to prepare data for machine learning models
- Design pipelines tailored for large language models, including embedding generation, data segmentation, and contextual data preparation
- Support data workflows for intelligent systems that rely on real-time processing, event-driven architecture, and persistent data storage
- Implement solutions using modern cloud and data platforms to enable large-scale data processing and analytics
- Establish and maintain data quality standards, governance practices, and monitoring systems
- Work closely with data scientists, engineers, and product teams to deliver datasets for model training, testing, and deployment
- Continuously optimise pipelines for efficiency, reliability, and performance across both batch and real-time systems
MLOps & Platform Responsibilities
- Lead the development and operation of machine learning pipelines across the full lifecycle, from training to deployment
- Build reliable workflows for model evaluation, versioning, and production rollout
- Collaborate with technical teams to deploy AI models, generative systems, and retrieval-based solutions
- Design systems for distributed training, parameter tuning, and efficient model serving
- Implement monitoring and validation frameworks to track model performance, detect drift, and ensure compliance
- Manage automated deployment processes for models, data assets, and AI components using modern CI/CD practices
- Oversee experiment tracking, model lifecycle management, and environment promotion processes
- Ensure seamless integration between machine learning frameworks and cloud-based infrastructure
- Maintain high standards for system scalability, reliability, and observability
- Define best practices, reusable patterns, and standards for AI and data engineering workflows
Required Qualifications
- Bachelor’s degree in a technical discipline such as engineering, computer science, or a related field
- At least five years of experience in data engineering, large-scale data systems, or machine learning data pipelines
- Strong experience with distributed data processing frameworks (e.g., Apache Spark)
- Proficiency in Python and SQL, with experience handling large datasets
- Hands-on experience building cloud-based data pipelines (e.g., AWS or similar platforms)
- Experience designing data systems that support multiple AI workloads, including model training and inference
- Solid understanding of data modelling, integration, and production-grade pipeline development
Preferred Experience
- Exposure to AI systems at scale, including traditional ML, generative AI, and agent-based solutions
- Familiarity with vector databases and semantic search technologies
- Knowledge of preparing data for large language models, including text processing and context structuring
- Experience working with unstructured data processing techniques (e.g., NLP, OCR, computer vision)
- Experience with workflow orchestration and AI development platforms
- Understanding of MLOps tools and practices, including experiment tracking and deployment automation
- Awareness of emerging AI architectures such as memory-driven or tool-integrated systems
- Strong analytical thinking, attention to detail, and commitment to data quality
- Ability to work effectively in fast-moving, collaborative environments