What are the responsibilities and job description for the ML-Infrastructure Engineer position at Coval?
THE ROLE
Every simulation we run touches multiple models (LLMs, speech-to-text, text-to-speech), and our Fortune 500 customers need hundreds, sometimes thousands of these running concurrently. Making that fast, reliable, and cost-efficient is the job.
We've built the skeleton. Our team has done this before: running single-digit-percent-of-Google-scale compute at Waymo for massive workloads. The auto-scaling foundations, the queuing systems, the monitoring patterns are in place. But we're at an inflection point — demand is growing fast and there's a ton of low-hanging fruit: optimizing how many workloads run on a single machine, tuning scaling algorithms, deciding what to self-host versus what to keep as managed services.
You'll Own Our Model Infrastructure End To End
What We're Looking For
You'll work in Python, building and operating auto-scaling compute infrastructure on AWS with GPU instances, containerized deployments, and modern observability tooling. You'll work across both self-hosted open-source models and managed API services.
Every simulation we run touches multiple models (LLMs, speech-to-text, text-to-speech), and our Fortune 500 customers need hundreds, sometimes thousands of these running concurrently. Making that fast, reliable, and cost-efficient is the job.
We've built the skeleton. Our team has done this before: running single-digit-percent-of-Google-scale compute at Waymo for massive workloads. The auto-scaling foundations, the queuing systems, the monitoring patterns are in place. But we're at an inflection point — demand is growing fast and there's a ton of low-hanging fruit: optimizing how many workloads run on a single machine, tuning scaling algorithms, deciding what to self-host versus what to keep as managed services.
You'll Own Our Model Infrastructure End To End
- Scaling GPU and compute infrastructure. Architect and operate the auto-scaling systems that handle spikes of hundreds to thousands of concurrent simulations. Optimize how we provision, schedule, and monitor GPU instances.
- Making the hosting decisions. We use a mix of closed-source hosted models and open-source self-hosted models today. You'll evaluate the tradeoffs (cost, latency, quality) and make the calls on what to host, where, and how it connects to the rest of our pipeline.
- Making our pipelines go fast. You're obsessive about performance. You want to know exactly what compute we're using, where the bottlenecks are, and how to squeeze more throughput out of every machine. You live in monitoring dashboards and you love it.
- Staying on the frontier of models. Voice AI models are getting commoditized, and we get to experiment with all of them. You'll benchmark the latest models across the full voice stack, run comparisons, and help us stay ahead of what's coming next.
What We're Looking For
- You've built and operated auto-scaling infrastructure for compute-heavy workloads, ideally involving GPUs and model serving.
- You're a hardware nerd at heart. You care about what instances we're running, how scaling policies are tuned, and whether we're leaving performance on the table.
- You're obsessive about monitoring and observability. You want to know when something is degrading before it becomes an incident.
- You can make pragmatic calls on build-vs-buy, self-host-vs-managed, open-source-vs-closed. You're excited about the latest open-source models but you know when paying for a service is the right move.
- You're curious about the full voice AI model stack (LLMs, STT, TTS) and you want to be immersed in how these models evolve month to month.
- You want to shape infrastructure at a company where the decisions aren't already made for you.
You'll work in Python, building and operating auto-scaling compute infrastructure on AWS with GPU instances, containerized deployments, and modern observability tooling. You'll work across both self-hosted open-source models and managed API services.
Salary : $100,000 - $200,000