Demo

Software Engineer - ML/LLM Inference

Alldus
San Francisco, CA Full Time
POSTED ON 9/17/2025 CLOSED ON 1/5/2026

What are the responsibilities and job description for the Software Engineer - ML/LLM Inference position at Alldus?

My client is searching for a talented engineer to work on ML/LLM inference and serving. They specialize in developing next-gen LLM fine-tuning and inference engines.


We are seeking a talented and motivated Software Engineer specializing in Machine Learning (ML) and Large Language Model (LLM) inference to join our dynamic ML Inference team. In this role, you will bridge the gap between AI/ML research and systems programming to build and enhance our next-generation LLM Inference Engine. You will play a crucial role in optimizing the performance, scalability, and efficiency of our LLM serving systems.


Key Responsibilities:


Develop and Enhance Inference Engine:

  • Design, implement, and optimize the next-generation LLM Inference Engine.
  • Integrate the latest LLM inference techniques from research to enhance latency and throughput.


Performance Optimization:

  • Conduct deep performance optimizations across multiple layers of the technology stack, including PyTorch, C , and CUDA.
  • Analyze and improve system performance to meet the demands of various use cases.


Customer Collaboration:

  • Work closely with customers to understand specific performance requirements and optimize solutions accordingly.
  • Provide technical expertise and support to ensure successful deployment and operation of inference systems.


Technical Leadership:

  • Define the roadmap and technical vision for the inference stack.
  • Lead initiatives to drive innovation and maintain the competitive edge of our inference technologies.


Infrastructure Development:

  • Collaborate with partner teams to build and maintain scalable, multi-replica serving infrastructure.
  • Ensure the reliability and scalability of LLM serving systems to handle increasing workloads.


Qualifications:


Technical Skills:

  • Proficiency in systems programming languages such as C .
  • Strong experience with machine learning frameworks, particularly PyTorch.
  • Expertise in GPU programming and CUDA for performance optimization.
  • Solid understanding of AI/ML concepts, especially related to large language models.


Experience:

  • Proven experience in developing and optimizing ML/LLM inference systems.
  • Demonstrated ability to integrate research advancements into production systems.
  • Experience with performance tuning and profiling across various technology stacks.
  • Experience with vLLM

ML Engineer — LLM Privacy
Dynamo AI -
San Francisco, CA
Lead AI Engineer (FM Hosting, LLM Inference)
Capital One -
San Francisco, CA
Distributed ML Systems Engineer- Inference
togetherai -
San Francisco, CA

Salary.com Estimation for Software Engineer - ML/LLM Inference in San Francisco, CA
$100,312 to $121,993
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Software Engineer - ML/LLM Inference?

Sign up to receive alerts about other jobs on the Software Engineer - ML/LLM Inference career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$97,257 - $120,701
Income Estimation: 
$123,167 - $152,295
Income Estimation: 
$97,257 - $120,701
Income Estimation: 
$123,167 - $152,295
Income Estimation: 
$77,657 - $95,021
Income Estimation: 
$97,257 - $120,701
Income Estimation: 
$176,149 - $220,529
Income Estimation: 
$156,679 - $196,968
Income Estimation: 
$123,167 - $152,295
Income Estimation: 
$146,673 - $180,130
This job has expired.
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Alldus

  • Alldus York, NY
  • Full Stack Engineer (Back end lean) Alldus are currently helping a startup in NYC scale their engineering team to build an AI-driven SaaS product that help... more
  • 6 Days Ago

  • Alldus San Jose, CA
  • We’re looking for exceptional software engineers to design and scale next-generation data processing and analytics platforms that power OLTP, OLAP, and lar... more
  • 6 Days Ago

  • Alldus Charlotte, NC
  • I am currently seeking a ServiceNow Architect This role requires close partnership and collaboration with other Business Stakeholders and Subject Matter Ex... more
  • 7 Days Ago

  • Alldus Chicago, IL
  • I am currently seeking a ServiceNow Business Process Consultant. This role requires close partnership and collaboration with other Business Stakeholders an... more
  • 7 Days Ago


Not the job you're looking for? Here are some other Software Engineer - ML/LLM Inference jobs in the San Francisco, CA area that may be a better fit.

  • togetherai San Francisco, CA
  • About the Role At Together.ai, we are building state-of-the-art infrastructure to enable efficient and scalable inference for large language models (LLMs).... more
  • 29 Days Ago

  • Together AI San Francisco, CA
  • About The Role At Together.ai, we are building state-of-the-art infrastructure to enable efficient and scalable inference for large language models (LLMs).... more
  • 7 Days Ago

AI Assistant is available now!

Feel free to start your new journey!