Demo

ML Inference Software Engineer

Lumex Talent
Palo Alto, CA Full Time
POSTED ON 11/26/2025
AVAILABLE BEFORE 12/25/2025

A well-funded AI startup is building a platform that lets anyone generate fully interactive 2D/3D worlds from natural language instantly. Backed by a $28M seed round and founded by engineers from Stanford, NVIDIA, Meta, and Epic Games, they’re combining multimodal reasoning, simulation, graphics, and real-time generation into one unified system.


They’re hiring a Senior ML Infrastructure Engineer to take ownership of GPU performance, model serving, and end-to-end inference optimization.


What You’ll Do

  • Improve model throughput, latency, and cost by 2–10×
  • Optimize the GPU stack using CUDA/Triton kernels, FlashAttention, paged attention, and CUDA Graphs
  • Build and refine inference systems with TensorRT-LLM, Triton Inference Server, vLLM/TGI
  • Implement advanced performance techniques: continuous batching, on-GPU KV reuse, speculative decoding/Medusa
  • Own profiling, optimization, deployment, and validation of all core inference workflows
  • Work closely with research and engine teams to support real-time world generation and simulation

What They’re Looking For

  • 2–3 years in ML infrastructure, GPU systems, or LLM inference
  • Strong background in GPU performance optimization
  • Experience with high-performance serving stacks and distributed ML systems
  • Comfortable operating in a fast-paced, high-ownership startup environment

Why This Role Matters

This role directly shapes how fast their models run, how the platform scales, and how creators and agents interact inside generated worlds in real time.

Salary : $200,000 - $500,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a ML Inference Software Engineer?

Sign up to receive alerts about other jobs on the ML Inference Software Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$77,900 - $95,589
Income Estimation: 
$101,387 - $124,118
Income Estimation: 
$79,473 - $93,666
Income Estimation: 
$90,372 - $103,622
Income Estimation: 
$61,825 - $80,560
Income Estimation: 
$90,032 - $105,965
Income Estimation: 
$85,996 - $102,718
Income Estimation: 
$90,032 - $105,965
Income Estimation: 
$111,859 - $131,446
Income Estimation: 
$110,457 - $133,106
Income Estimation: 
$105,809 - $128,724
Income Estimation: 
$122,763 - $145,698
Income Estimation: 
$110,457 - $133,106
Income Estimation: 
$136,611 - $163,397
Income Estimation: 
$135,163 - $163,519
Income Estimation: 
$131,953 - $159,624
Income Estimation: 
$150,859 - $181,127
Income Estimation: 
$162,237 - $199,353
Income Estimation: 
$222,110 - $256,974
Income Estimation: 
$224,976 - $270,947
Income Estimation: 
$205,834 - $254,869
Income Estimation: 
$242,530 - $287,120
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Not the job you're looking for? Here are some other ML Inference Software Engineer jobs in the Palo Alto, CA area that may be a better fit.

Software Engineer, Inference AI/ML

CoreWeave, Sunnyvale, CA

AI Assistant is available now!

Feel free to start your new journey!