What are the responsibilities and job description for the ML-Infrastructure Engineer position at Coval?

THE ROLE

Every simulation we run touches multiple models (LLMs, speech-to-text, text-to-speech), and our Fortune 500 customers need hundreds, sometimes thousands of these running concurrently. Making that fast, reliable, and cost-efficient is the job.

We've built the skeleton. Our team has done this before: running single-digit-percent-of-Google-scale compute at Waymo for massive workloads. The auto-scaling foundations, the queuing systems, the monitoring patterns are in place. But we're at an inflection point — demand is growing fast and there's a ton of low-hanging fruit: optimizing how many workloads run on a single machine, tuning scaling algorithms, deciding what to self-host versus what to keep as managed services.

You'll Own Our Model Infrastructure End To End

Scaling GPU and compute infrastructure. Architect and operate the auto-scaling systems that handle spikes of hundreds to thousands of concurrent simulations. Optimize how we provision, schedule, and monitor GPU instances.
Making the hosting decisions. We use a mix of closed-source hosted models and open-source self-hosted models today. You'll evaluate the tradeoffs (cost, latency, quality) and make the calls on what to host, where, and how it connects to the rest of our pipeline.
Making our pipelines go fast. You're obsessive about performance. You want to know exactly what compute we're using, where the bottlenecks are, and how to squeeze more throughput out of every machine. You live in monitoring dashboards and you love it.
Staying on the frontier of models. Voice AI models are getting commoditized, and we get to experiment with all of them. You'll benchmark the latest models across the full voice stack, run comparisons, and help us stay ahead of what's coming next.

What makes this more interesting than a similar role at a bigger AI company: you're not scoped to a narrow set of tasks. You'll develop and architect large parts of our compute infrastructure, and you'll shape the decisions about which models we run and how.

What We're Looking For

You've built and operated auto-scaling infrastructure for compute-heavy workloads, ideally involving GPUs and model serving.
You're a hardware nerd at heart. You care about what instances we're running, how scaling policies are tuned, and whether we're leaving performance on the table.
You're obsessive about monitoring and observability. You want to know when something is degrading before it becomes an incident.
You can make pragmatic calls on build-vs-buy, self-host-vs-managed, open-source-vs-closed. You're excited about the latest open-source models but you know when paying for a service is the right move.
You're curious about the full voice AI model stack (LLMs, STT, TTS) and you want to be immersed in how these models evolve month to month.
You want to shape infrastructure at a company where the decisions aren't already made for you.

What You'll Work With

You'll work in Python, building and operating auto-scaling compute infrastructure on AWS with GPU instances, containerized deployments, and modern observability tooling. You'll work across both self-hosted open-source models and managed API services.

Salary : $100,000 - $200,000

Apply for this job

Receive alerts for other ML-Infrastructure Engineer job openings

ML-Infrastructure Engineer

What are the responsibilities and job description for the ML-Infrastructure Engineer position at Coval?

What is the career path for a ML-Infrastructure Engineer?

Job openings at Coval

Not the job you're looking for? Here are some other ML-Infrastructure Engineer jobs in the San Francisco, CA area that may be a better fit.

We don't have any other ML-Infrastructure Engineer jobs in the San Francisco, CA area right now.

AI Assistant is available now!