Demo

Founding Backend Engineer - LLM Orchestration

Nestmed
San Francisco, CA Full Time
POSTED ON 12/8/2025 CLOSED ON 1/5/2026

What are the responsibilities and job description for the Founding Backend Engineer - LLM Orchestration position at Nestmed?

Healthcare documentation is broken. Nestmed is fixing it with an AI platform that gives clinicians their time back.

In just one year, we’ve scaled to support tens of thousands of clinicians across more than a million patient visits. We're now the trusted partner for over 60 home health agencies, including 7 of the top 10 enterprises in the US.

Our founding team—hailing from Stanford, YC, Google, and Meta—is backed by the founders of PayPal and Plaid to build the essential infrastructure for the future of the $500B home healthcare industry.

About The Role

As the founding Backend Engineer on our LLM Orchestration team, you'll be deploying and managing LLMs at scale, learning how to orchestrate them in complex production scenarios that directly impact patient care. You'll rebuild and maintain our core AI inference engine that powers all of Nestmed's intelligent capabilities across several thousand clinical conversations daily.

Our system orchestrates over a dozen different AI models - both fine-tuned in-house models and third-party APIs - with low latency and high availability. You'll work on complex technical challenges like intelligent model routing based on clinical context, implementing sophisticated fallback strategies across multiple providers, optimizing inference costs through batching and caching, and ensuring clinical accuracy through comprehensive model evaluation pipelines.

This isn't about calling OpenAI APIs. You'll build sophisticated orchestration logic that selects optimal models for each clinical task, implements custom retry and circuit breaker patterns for provider failures, manages rate limits across multiple concurrent workflows, and maintains detailed performance metrics across the entire AI pipeline. You'll start as the solo engineer on this critical infrastructure and grow it into a robust team handling core AI engineering.

What You'll Do

  • Build and optimize our core AI inference engine that routes requests across multiple LLM providers based on clinical context, cost optimization, and latency requirements
  • Design robust model serving infrastructure with intelligent load balancing, failover mechanisms, and A/B testing frameworks for model evaluation in production
  • Implement production-grade AI pipelines with comprehensive observability, distributed tracing, and real-time performance monitoring for healthcare-critical workloads
  • Optimize inference costs and latency through intelligent request batching, response caching, model quantization, and dynamic provider selection algorithms
  • Build custom model fine-tuning and deployment pipelines for healthcare-specific tasks using frameworks like Transformers, vLLM, and distributed training infrastructure
  • Create sophisticated prompt engineering systems that dynamically optimize prompts based on clinical context and historical model performance data
  • Design comprehensive evaluation frameworks that continuously monitor model accuracy, clinical safety, and regulatory compliance across all deployed models
  • Build model versioning and deployment systems that support safe rollouts, instant rollbacks, and controlled experimentation in production healthcare environments

What You Bring

  • 6 years of backend engineering experience building high-performance distributed systems, with focus on latency-critical applications and reliability engineering
  • Deep production experience with LLMs including multi-provider orchestration, custom model serving, and building reliable inference infrastructure at scale
  • Strong expertise in ML infrastructure including model serving frameworks (TensorRT, vLLM, TorchServe), distributed training, and GPU optimization
  • Experience with model evaluation and monitoring including A/B testing frameworks, performance monitoring, and building comprehensive observability for ML systems
  • Proficiency in Python and ML frameworks with hands-on experience in model fine-tuning, prompt engineering, and deploying custom models to production
  • Track record scaling ML systems with experience optimizing inference costs, managing multiple model providers, and building reliable AI infrastructure
  • Understanding of healthcare or regulated industries where model accuracy, auditability, and compliance are mission-critical requirements
  • San Francisco-based and excited about working closely with AI researchers to productionize cutting-edge models for healthcare applications

Why This Role Matters

You'll be building the AI infrastructure that processes millions of patient interactions, directly impacting care quality for thousands of patients daily. Every optimization you make reduces healthcare costs, improves clinical accuracy, and enables new AI capabilities that transform patient outcomes.

You'll start as the founding ML infrastructure engineer and build this into a world-class AI platform team. Join us in San Francisco to build the most sophisticated LLM orchestration system in healthcare alongside leading AI researchers and clinical experts.

If you’re passionate about building high-impact products that solve real-world problems, we’d love to hear from you. Apply today!

Compensation Range: $200K - $350K

Salary : $200,000 - $350,000

Founding Backend Engineer
Cenna Systems, Inc -
San Francisco, CA
Founding Backend Engineer (Rust)
Renumerate -
San Francisco, CA
Founding Engineer - Backend
HireTo by Kuvaka -
San Francisco, CA

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Founding Backend Engineer - LLM Orchestration?

Sign up to receive alerts about other jobs on the Founding Backend Engineer - LLM Orchestration career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$162,237 - $199,353
Income Estimation: 
$222,110 - $256,974
Income Estimation: 
$224,976 - $270,947
Income Estimation: 
$205,834 - $254,869
Income Estimation: 
$242,530 - $287,120
Income Estimation: 
$79,473 - $93,666
Income Estimation: 
$90,372 - $103,622
Income Estimation: 
$61,825 - $80,560
Income Estimation: 
$90,032 - $105,965
Income Estimation: 
$85,996 - $102,718
Income Estimation: 
$110,457 - $133,106
Income Estimation: 
$136,611 - $163,397
Income Estimation: 
$135,163 - $163,519
Income Estimation: 
$131,953 - $159,624
Income Estimation: 
$150,859 - $181,127
Income Estimation: 
$105,809 - $128,724
Income Estimation: 
$136,611 - $163,397
Income Estimation: 
$135,163 - $163,519
Income Estimation: 
$131,953 - $159,624
Income Estimation: 
$150,859 - $181,127
This job has expired.
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Not the job you're looking for? Here are some other Founding Backend Engineer - LLM Orchestration jobs in the San Francisco, CA area that may be a better fit.

  • AnswerThis (YC F25) San Francisco, CA
  • We crossed $1M ARR in 8 months. 200,000 researchers at Stanford, MIT, and Amazon use us to do literature reviews 10x faster.\ \ Now we're building somethin... more
  • 10 Days Ago

  • Plaud San Francisco, CA
  • About Plaud Inc. Plaud is building the world's most trusted AI work companion for professionals to elevate productivity and performance through note-taking... more
  • 20 Days Ago

AI Assistant is available now!

Feel free to start your new journey!