What are the responsibilities and job description for the ML Inference Software Engineer position at Lumex Talent?

A well-funded AI startup is building a platform that lets anyone generate fully interactive 2D/3D worlds from natural language instantly. Backed by a $28M seed round and founded by engineers from Stanford, NVIDIA, Meta, and Epic Games, they’re combining multimodal reasoning, simulation, graphics, and real-time generation into one unified system.

They’re hiring a Senior ML Infrastructure Engineer to take ownership of GPU performance, model serving, and end-to-end inference optimization.

What You’ll Do

Improve model throughput, latency, and cost by 2–10×
Optimize the GPU stack using CUDA/Triton kernels, FlashAttention, paged attention, and CUDA Graphs
Build and refine inference systems with TensorRT-LLM, Triton Inference Server, vLLM/TGI
Implement advanced performance techniques: continuous batching, on-GPU KV reuse, speculative decoding/Medusa
Own profiling, optimization, deployment, and validation of all core inference workflows
Work closely with research and engine teams to support real-time world generation and simulation

What They’re Looking For

2–3 years in ML infrastructure, GPU systems, or LLM inference
Strong background in GPU performance optimization
Experience with high-performance serving stacks and distributed ML systems
Comfortable operating in a fast-paced, high-ownership startup environment

Why This Role Matters

This role directly shapes how fast their models run, how the platform scales, and how creators and agents interact inside generated worlds in real time.

Salary : $200,000 - $500,000

Apply for this job

Receive alerts for other ML Inference Software Engineer job openings

ML Inference Software Engineer

What are the responsibilities and job description for the ML Inference Software Engineer position at Lumex Talent?

What is the career path for a ML Inference Software Engineer?

Not the job you're looking for? Here are some other ML Inference Software Engineer jobs in the Palo Alto, CA area that may be a better fit.

We don't have any other ML Inference Software Engineer jobs in the Palo Alto, CA area right now.

AI Assistant is available now!