What are the responsibilities and job description for the MLOps Engineer - NYC, NY - Hybrid position at InfiCare Technologies?
Position: MLOps Engineer
Location: NYC, NY
Mode Of Hire: Contract
Mode Of Work: Hybrid (3 days from office)
Job Description:
- Design, deploy, and operate end‑to‑end production ML pipelines across Dev, QA, and Prod environments.
- Set up and manage AWS SageMaker pipelines, endpoints, and monitoring for large scale inference workloads, including embedding generation, named entity recognition, reranking, and video processing.
- Own GPU and CPU infrastructure selection, scaling, and optimization, including instance benchmarking, autoscaling behavior, and load testing.
- Deploy, monitor, and operate inference services that support hundreds of thousands of queries per day across text, image, and video pipelines.
- Establish standardized ML deployment patterns at AP, including:
- Containerization and orchestration strategies
- Environment isolation (Dev / QA / Prod)
- Versioned promotion, rollback, and recovery mechanisms
- Implement monitoring, alerting, drift detection, and evaluation metrics for production ML systems, tracking latency, error rates, throughput, and model/data drift.
- Enable A/B testing and controlled rollout strategies for ML models in production, in partnership with engineering and product teams.
- Partner closely with ML Engineers, Data Scientists, DevOps, and Platform teams to:
- Operationalize new models and pipeline improvements
- Promote systems across environments safely
- Ensure deployments meet reliability, scale, and cost targets
- Manage high-throughput I/O and data movement for large collections of media assets (text, images, video), avoiding CPU, network, and storage bottlenecks.
- Reduce operational risk by enforcing reproducibility, observability, security, and cost controls across all production ML systems.
Required Skills & Experience
- Hands‑on experience deploying and operating ML inference systems in production.
- Strong experience with AWS SageMaker, including pipelines, endpoints, monitoring, and multi‑environment deployments.
- Expertise deploying ML models using PyTorch and TensorFlow from an operational and serving perspective.
- Proven experience with model deployment and orchestration, including containerized inference and autoscaling.
- Experience selecting, evaluating, and optimizing compute resources (GPU/CPU) for production ML workloads.
- Experience setting up monitoring, evaluation metrics, and A/B testing frameworks for ML systems in production.
- Ability to collaborate effectively with ML Engineers, Data Scientists, and platform teams in a shared ownership model.
Strongly Preferred
- Operational experience supporting ML systems involving:
- Transformer‑based NLP models (e.g., BERT‑family models)
- Computer vision models
- Ranking and reranking systems
- Familiarity operating systems that use common ML model types such as:
- Convolutional and feed‑forward neural networks
- Ranking algorithms
- Approximate Nearest Neighbor methods (e.g., HNSW)
- Experience running ML workloads over large‑scale text, image, and video datasets.