Demo

ML Systems Engineer – Inference Serving

Oho Group
San Jose, CA Full Time
POSTED ON 6/17/2026
AVAILABLE BEFORE 7/15/2026

About the Opportunity


A genuine ground-floor opportunity at a stealth, well-funded AI hardware startup building a custom AI SoC and full inference serving stack from the ground up. You'll work shoulder-to-shoulder with world-class hardware and software engineers, with real end-to-end ownership over how foundation models run on next-generation silicon.


What You'll Do


  • You'll get the chance to be a core contributor on a small, senior team designing and building state-of-the-art inference serving and cluster scheduling capabilities for a custom AI SoC
  • You'll have the opportunity to architect high-performance multi-node inference stacks, owning throughput and latency from first principles
  • You'll get to design and implement advanced optimisation strategies across TP/PP/EP hybrids, continuous batching, and KV cache management at the intersection of compute, networking, and storage
  • You'll have the chance to drive performance improvements directly inside leading open-source inference frameworks including vLLM, SGLang, and PyTorch
  • You'll get the opportunity to build advanced cluster scheduling algorithms that push efficiency boundaries for large-scale foundation model serving
  • You'll be able to engage with the open-source community directly — upstreaming optimisations and shaping the roadmap of widely used AI infrastructure projects
  • You'll get to apply rigorous benchmarking, testing, and debugging practices to maintain a production-grade stack running on novel silicon


What We're Looking For


  • Strong Python, C , and PyTorch fundamentals with a proven track record of shipping high-quality software in fast-moving environments
  • 1 years as an active developer contributing to LLM inference serving frameworks such as vLLM or SGLang
  • Deep knowledge of LLM inference internals — KV cache, batching strategies, and attention mechanisms
  • Experience running and optimising large-scale workloads on heterogeneous clusters
  • Strong performance analysis skills; GPU kernel development in CUDA, Triton, or ROCm is a plus
  • Familiarity with networking, storage management, or distributed scheduling technologies such as Orca or LMCache is a significant plus


Education

Master's or PhD in Computer Science, Computer Engineering, Electrical Engineering, or equivalent industry experience preferred.

Salary : $250,000 - $300,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a ML Systems Engineer – Inference Serving?

Sign up to receive alerts about other jobs on the ML Systems Engineer – Inference Serving career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$101,387 - $124,118
Income Estimation: 
$119,030 - $151,900
Income Estimation: 
$101,387 - $124,118
Income Estimation: 
$119,030 - $151,900
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Oho Group

  • Oho Group Los Angeles, CA
  • This is a high impact leadership role offering the opportunity to build processes, influence strategy, and play a key role in scaling a complex hardware or... more
  • 12 Days Ago

  • Oho Group Los Angeles, CA
  • Head Of Manufacturing We are looking for a Head of Manufacturing to define and execute manufacturing strategy in a fast-moving, product-focused environment... more
  • 12 Days Ago

  • Oho Group Boston, MA
  • The Opportunity Join an advanced technology company developing complex, high-performance systems that integrate hardware and software. As an Optical Engine... more
  • 15 Days Ago

  • Oho Group San Jose, CA
  • Compiler Optimization Engineer / Well-funded start-up / Greenfield opportunity Exciting chance to join a well-funded startup building a hardware-agnostic A... more
  • 2 Days Ago


Not the job you're looking for? Here are some other ML Systems Engineer – Inference Serving jobs in the San Jose, CA area that may be a better fit.

  • Luma Palo Alto, CA
  • About Luma AI Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intellig... more
  • 25 Days Ago

  • Amazon Web Services (AWS) Cupertino, CA
  • Description AWS Neuron is the software stack powering AWS Inferentia and Trainium machine learning accelerators, designed to deliver high-performance, low-... more
  • 13 Days Ago

AI Assistant is available now!

Feel free to start your new journey!