What are the responsibilities and job description for the Associate Director, MLOps position at Evolution USA?
I’m partnering with a fast‑growing, mission‑driven AI technology company operating at the intersection of machine learning, large‑scale infrastructure, and real‑world impact. The business is well‑funded, product‑led, and already deploying AI systems into production at meaningful scale.
They’re now hiring an Associate Director, MLOps to lead the team responsible for the core ML infrastructure that supports large‑scale training, inference, and production deployment.
The Opportunity
This is a senior, hands‑on leadership role where you’ll own the ML platform that bridges research and production. You’ll be responsible for evolving the MLOps stack to support the next phase of scale — both technically and organizationally.
The environment suits someone who enjoys building reliable systems, leading strong engineers, and solving complex infrastructure problems in a highly collaborative setting.
The technical scope is broad and modern, including:
- Large‑scale ML training and inference workloads
- Cloud infrastructure & Kubernetes
- Distributed systems & observability
- Platform tooling and DevOps best practices
What You’ll Be Responsible For
- MLOps Vision & Strategy
- Define and execute the long‑term roadmap for the MLOps platform, balancing short‑term delivery with long‑term architectural evolution.
- Team Leadership
- Lead, mentor, and grow a team of 6–7 engineers. Allocate resources effectively across platform support and strategic initiatives.
- Cross‑Functional Partnership
- Work closely with ML, data science, product engineering, infrastructure, and SRE leadership to identify bottlenecks, improve developer experience, and enable faster model deployment.
- Scalable ML Foundations
- Architect compute and storage pipelines capable of supporting extremely large datasets and complex derived artifacts with strong guarantees around performance and consistency.
- Inference Platform Modernization
- Drive modernization of the production inference stack to support 5–10× growth in AI workloads across global deployments.
- Observability & Cost Transparency
- Collaborate with SRE to implement metrics and tooling around utilization, performance, cost attribution, and turnaround time.
- Technology Evaluation & Stack Evolution
- Lead build‑vs‑buy decisions and platform audits, benchmarking internal tools against best‑in‑class commercial and open‑source solutions.
Background They’re Looking For
- Degree in Computer Science, Engineering, or similar (or equivalent practical experience)
- 2–3 years leading engineering teams, ideally within MLOps, ML infrastructure, or platform engineering
- Strong hands‑on experience running ML workloads on Kubernetes and major cloud providers (AWS, GCP, or Azure)
- Experience with workflow orchestration (Airflow, Kubeflow, or similar), infrastructure‑as‑code (Terraform, Helm), and modern DevOps practices
- Proven experience managing very large datasets and high‑throughput, production inference systems
- Strong software engineering background in complex, distributed, multi‑language environments
- Practical use of AI developer tools (e.g. Copilot, Cursor, Claude) as part of the engineering workflow
Nice to Have (Not Essential)
- Experience with ML frameworks such as PyTorch or scikit‑learn
- Exposure to large‑scale data platforms (Spark, Hive, Databricks, EMR, etc.)
- Strong grounding in MLOps best practices: model lifecycle management, feature stores, monitoring, and CI/CD for ML
- Familiarity with security and compliance considerations in ML systems
US Citizens/ Green Card holders only.
Salary : $180,000 - $240,000