What are the responsibilities and job description for the ML Inference Software Engineer position at Lumex Talent?
A well-funded AI startup is building a platform that lets anyone generate fully interactive 2D/3D worlds from natural language instantly. Backed by a $28M seed round and founded by engineers from Stanford, NVIDIA, Meta, and Epic Games, they’re combining multimodal reasoning, simulation, graphics, and real-time generation into one unified system.
They’re hiring a Senior ML Infrastructure Engineer to take ownership of GPU performance, model serving, and end-to-end inference optimization.
What You’ll Do
- Improve model throughput, latency, and cost by 2–10×
- Optimize the GPU stack using CUDA/Triton kernels, FlashAttention, paged attention, and CUDA Graphs
- Build and refine inference systems with TensorRT-LLM, Triton Inference Server, vLLM/TGI
- Implement advanced performance techniques: continuous batching, on-GPU KV reuse, speculative decoding/Medusa
- Own profiling, optimization, deployment, and validation of all core inference workflows
- Work closely with research and engine teams to support real-time world generation and simulation
What They’re Looking For
- 2–3 years in ML infrastructure, GPU systems, or LLM inference
- Strong background in GPU performance optimization
- Experience with high-performance serving stacks and distributed ML systems
- Comfortable operating in a fast-paced, high-ownership startup environment
Why This Role Matters
This role directly shapes how fast their models run, how the platform scales, and how creators and agents interact inside generated worlds in real time.
Salary : $200,000 - $500,000