Demo

LLM Inference Engineer

Majestic Labs ai
Los Altos, CA Full Time
POSTED ON 6/15/2026
AVAILABLE BEFORE 6/14/2027
The Role

In this high-impact role, you are the bridge between cutting-edge custom silicon and production-grade AI. You will own the end-to-end LLM serving stack on Majestic hardware, architecting everything from serving APIs down to KV cache management, batching, and scheduling. Your primary mission is to port leading frameworks like vLLM and SGLang to our accelerator and optimize them for peak performance. Because our architecture offers memory headroom, you won't just match traditional GPUs; you will shatter their limits on throughput, batch sizes, and context lengths. As you hunt down bottlenecks, your insights will directly steer our future kernel, compiler, and hardware development.

What You'll Own

  • The serving stack, end to end — bring up and adapt a modern inference framework (vLLM, SGLang, or similar) to run on Majestic hardware.
  • The runtime hot path — continuous batching, the scheduler, paged KV cache, and prefill/decode disaggregation.
  • Distributed inference at scale — tensor, pipeline, and expert parallelism across accelerators, wired into our collective communication library (CCL).
  • The multi-modal pipeline — image, audio, and video preprocessing, encoder integration, and mixed-modality batching.
  • Inference-time techniques — speculative decoding, prefix caching, and structured decoding.
  • End-to-end performance — profile, benchmark, and hunt down bottlenecks across the full serving path, feeding findings back to the kernel, compiler, and hardware teams.

Requirements:

What We're Looking For

  • 3 years building or operating production LLM inference and serving systems (5 preferred).
  • Deep, hands-on work with a modern inference framework vLLM, SGLang, TensorRT-LLM, Fireworks, or similar including its scheduler, paged attention / KV cache, model executor, and backend integration points.
  • Strong Python and C , with the ability to move fluidly between the two.
  • A real grasp of transformer inference the prefill/decode split, KV cache behavior, and how batching dynamics shape latency and throughput.
  • Distributed inference experience tensor and pipeline parallelism across multiple devices.
  • An instinct for performance you can profile an end-to-end stack and chase a regression from the serving API all the way down to the kernel.

Salary.com Estimation for LLM Inference Engineer in Los Altos, CA
$109,085 to $128,284
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a LLM Inference Engineer?

Sign up to receive alerts about other jobs on the LLM Inference Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$85,996 - $102,718
Income Estimation: 
$111,859 - $131,446
Income Estimation: 
$110,457 - $133,106
Income Estimation: 
$105,809 - $128,724
Income Estimation: 
$122,763 - $145,698
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Majestic Labs ai

  • Majestic Labs ai Los Altos, CA
  • The Mission As a System Architect , you will be responsible for the end-to-end performance simulation of our next-generation AI and Graph-computing platfor... more
  • 7 Days Ago

  • Majestic Labs ai Los Altos, CA
  • About Us Majestic Labs ai is re-architecting AI systems for the data center and directly tackling the memory wall problem. The US–Israeli company was found... more
  • 7 Days Ago


Not the job you're looking for? Here are some other LLM Inference Engineer jobs in the Los Altos, CA area that may be a better fit.

  • Hippocratic AI Palo Alto, CA
  • About Us Hippocratic AI has developed a safety-focused Large Language Model (LLM) for healthcare. The company believes that a safe LLM can dramatically imp... more
  • 7 Days Ago

  • GMI Cloud Mountain View, CA
  • MLE (LLM inference) About US GMI Cloud is a fast-growing AI infrastructure company backed by Headline VC and one of only six cloud providers worldwide to e... more
  • 16 Days Ago

AI Assistant is available now!

Feel free to start your new journey!