Demo

ML Engineer - Inference [33157]

Stealth Startup
San Jose, CA Full Time
POSTED ON 5/23/2026
AVAILABLE BEFORE 6/22/2026

The role:

As our first ML Engineer specializing in inference and optimization, you'll bridge the gap between cutting-edge research models and production systems. Your expertise will transform PyTorch research code into highly optimized, low-latency inference solutions that power our user-facing applications. You'll work closely with our GenAI researchers, vision ML engineers, and backend team to deliver exceptional performance.

What you’ll do:

  • Deploy and integrate researcher-trained model checkpoints into our cloud infrastructure and production pipelines.
  • Conduct thorough performance profiling and benchmarking to identify and eliminate computational bottlenecks.
  • Implement neural network optimization techniques including quantization, pruning, and architectural refinements while preserving model accuracy.
  • Develop efficient training and fine-tuning strategies with optimal precision trade-offs and parallelism.
  • Build and maintain scalable multi-GPU inference solutions with sophisticated model parallelism and serving architectures.
  • Collaborate with the research team to ensure optimization integrate smoothly with model development workflows.

You may be a strong fit if you:

  • Have experience deploying and optimizing deep learning models for production environments, particularly with multi-GPU inference and large-scale model serving.
  • Are well-versed in cutting-edge techniques for optimizing both inference and training workloads.
  • Possess strong knowledge of efficient attention mechanisms and algorithms.
  • Have hands-on experience implementing model quantization and working with inference frameworks.
  • Can write production-quality code and successfully integrate ML models into robust inference pipelines.
  • Are familiar with various cloud platforms, storage solutions, and modern training frameworks.

Logistics:

  • This role is based in San Jose, where we work in person. We believe the best ideas come from being in the same room.
  • We sponsor visas. We are committed to working through the process together for the right candidates. If you're currently outside the US, we're also committed to helping you relocate to the US throughout this process.
  • We offer generous health, dental, and vision coverage, unlimited PTO, paid parental leave, and relocation support as needed.
  • Don't meet every single qualification? That’s okay — we care more about your trajectory than checking every box. If the role excites you and the mission resonates, we'd love to hear from you.

Note: In the event your application is successful and an offer of employment is made to you, any offer of employment will be conditional on the results of a background check, performed by a third party acting on our behalf.

Salary.com Estimation for ML Engineer - Inference [33157] in San Jose, CA
$113,518 to $145,766
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a ML Engineer - Inference [33157]?

Sign up to receive alerts about other jobs on the ML Engineer - Inference [33157] career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$77,900 - $95,589
Income Estimation: 
$101,387 - $124,118
Income Estimation: 
$77,900 - $95,589
Income Estimation: 
$101,387 - $124,118
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Stealth Startup

  • Stealth Startup Seattle, WA
  • Compensation: Base Uncapped Commission Experience: 1–5 years in agency recruiting, sales, or business development preferred About the Role We’re hiring a h... more
  • Just Posted

  • Stealth Startup Miami, FL
  • Compensation: Base Uncapped Commission Experience: 1–5 years in agency recruiting, sales, or business development preferred About the Role We’re hiring a h... more
  • Just Posted

  • Stealth Startup San Jose, CA
  • The role: As a GenAI research Scientist/Engineer, you'll lead the post-training and adaptation of large-scale generative models. But this isn't about repli... more
  • Just Posted

  • Stealth Startup San Francisco, CA
  • Senior Account Executives on the GTM team. If you love fast-paced startup life, and building a sales motion from the ground up, you’ll thrive here. You’ll ... more
  • 2 Days Ago


Not the job you're looking for? Here are some other ML Engineer - Inference [33157] jobs in the San Jose, CA area that may be a better fit.

  • Luma Palo Alto, CA
  • About Luma AI Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intellig... more
  • 1 Month Ago

  • cerebrassystems Sunnyvale, CA
  • Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens ... more
  • 2 Days Ago

AI Assistant is available now!

Feel free to start your new journey!