Demo

ML-Infrastructure Engineer

Coval
San Francisco, CA Full Time
POSTED ON 4/9/2026
AVAILABLE BEFORE 5/7/2026
THE ROLE

Every simulation we run touches multiple models (LLMs, speech-to-text, text-to-speech), and our Fortune 500 customers need hundreds, sometimes thousands of these running concurrently. Making that fast, reliable, and cost-efficient is the job.

We've built the skeleton. Our team has done this before: running single-digit-percent-of-Google-scale compute at Waymo for massive workloads. The auto-scaling foundations, the queuing systems, the monitoring patterns are in place. But we're at an inflection point — demand is growing fast and there's a ton of low-hanging fruit: optimizing how many workloads run on a single machine, tuning scaling algorithms, deciding what to self-host versus what to keep as managed services.

You'll Own Our Model Infrastructure End To End

  • Scaling GPU and compute infrastructure. Architect and operate the auto-scaling systems that handle spikes of hundreds to thousands of concurrent simulations. Optimize how we provision, schedule, and monitor GPU instances.
  • Making the hosting decisions. We use a mix of closed-source hosted models and open-source self-hosted models today. You'll evaluate the tradeoffs (cost, latency, quality) and make the calls on what to host, where, and how it connects to the rest of our pipeline.
  • Making our pipelines go fast. You're obsessive about performance. You want to know exactly what compute we're using, where the bottlenecks are, and how to squeeze more throughput out of every machine. You live in monitoring dashboards and you love it.
  • Staying on the frontier of models. Voice AI models are getting commoditized, and we get to experiment with all of them. You'll benchmark the latest models across the full voice stack, run comparisons, and help us stay ahead of what's coming next.

What makes this more interesting than a similar role at a bigger AI company: you're not scoped to a narrow set of tasks. You'll develop and architect large parts of our compute infrastructure, and you'll shape the decisions about which models we run and how.

What We're Looking For

  • You've built and operated auto-scaling infrastructure for compute-heavy workloads, ideally involving GPUs and model serving.
  • You're a hardware nerd at heart. You care about what instances we're running, how scaling policies are tuned, and whether we're leaving performance on the table.
  • You're obsessive about monitoring and observability. You want to know when something is degrading before it becomes an incident.
  • You can make pragmatic calls on build-vs-buy, self-host-vs-managed, open-source-vs-closed. You're excited about the latest open-source models but you know when paying for a service is the right move.
  • You're curious about the full voice AI model stack (LLMs, STT, TTS) and you want to be immersed in how these models evolve month to month.
  • You want to shape infrastructure at a company where the decisions aren't already made for you.

What You'll Work With

You'll work in Python, building and operating auto-scaling compute infrastructure on AWS with GPU instances, containerized deployments, and modern observability tooling. You'll work across both self-hosted open-source models and managed API services.

Salary : $100,000 - $200,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a ML-Infrastructure Engineer?

Sign up to receive alerts about other jobs on the ML-Infrastructure Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$101,387 - $124,118
Income Estimation: 
$119,030 - $151,900
Income Estimation: 
$103,114 - $138,258
Income Estimation: 
$118,163 - $145,996
Income Estimation: 
$120,777 - $151,022
Income Estimation: 
$129,363 - $167,316
Income Estimation: 
$86,891 - $130,303
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Coval

  • Coval San Francisco, CA
  • THE ROLE This is the first dedicated operations hire at Coval. You'll own business operations, partnerships, and go-to-market support at a company that's s... more
  • 11 Days Ago

  • Coval San Francisco, CA
  • THE ROLE Coval has a strong land-and-expand motion. You'll own the customer relationship post-sale: onboarding, adoption, retention, and expansion. Your jo... more
  • 11 Days Ago

  • Coval San Francisco, CA
  • THE ROLE We've been founder-led sales to this point. It's working: six-figure enterprise deals, Fortune 500 customers, strong inbound pipeline. But we need... more
  • 11 Days Ago

  • Coval San Francisco, CA
  • THE ROLE This is the first dedicated finance hire at Coval. You'll own the financial foundation of a company that's scaling very fast, past the point where... more
  • 11 Days Ago


Not the job you're looking for? Here are some other ML-Infrastructure Engineer jobs in the San Francisco, CA area that may be a better fit.

  • Anthropic San Francisco, CA
  • About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and ... more
  • 8 Days Ago

  • Matter Intelligence San Francisco, CA
  • About The Role We are seeking a Data Infrastructure Engineer to build and operate the infrastructure that turns drone, aerial, and orbital sensing data int... more
  • 23 Days Ago

AI Assistant is available now!

Feel free to start your new journey!