Demo

Software Engineering – Inference Engineer

Virtue AI
San Francisco, CA Full Time
POSTED ON 12/21/2025
AVAILABLE BEFORE 1/19/2026
About Virtue AI

Virtue AI sets the standard for advanced AI security platforms. Built on decades of foundational and award-winning research in AI security, its AI-native architecture unifies automated red-teaming, real-time multimodal guardrails, and systematic governance for enterprise apps and agents. Deploy in minutes—across any environment—to keep your AI protected and compliant. We are a well-funded, early-stage startup founded by industry veterans, and we're looking for passionate builders to join our core team.

What You’ll Do

As an Inference Engineer, you will own how models are served in production. Your job is to make inferences fast, stable, observable, and cost-efficient—even under unpredictable workloads.

You will:

  • Serve and optimize LLM, embedding, and other ML models' inference across multiple model families
  • Design and operate inference APIs with clear contracts, versioning, and backward compatibility
  • Build routing and load-balancing logic for inference traffic
    • Multi-model routing
    • Fallback and degradation strategies
    • vLLM or SGLang
  • Package inference services into production-ready Docker images
  • Implement logging and metrics for inference systems
    • Latency, throughput, token counts, GPU utilization
    • Prometheus-based metrics
  • Analyze server uptime and failure modes
    • GPU OOMs, hangs, slowdowns, fragmentation
    • Recovery and restart strategies
  • Design GPU and model placement strategies
    • Model sharding, replication, and batching
    • Tradeoffs between latency, cost, and availability
  • Work closely with backend, platform (Cloud, DevOps), and ML teams to align inference behavior with product requirements

What Makes You a Great Fit

You understand that inference is a systems problem, not just a model problem. You think in QPS, p99 latency, GPU memory, and failure domains.

Required Qualifications

  • Bachelor’s degree or higher in CS, CE, or related field
  • Strong experience serving LLMs and embedding models in production
  • Hands-on experience designing:
    • Inference APIs
    • Load balancing and routing logic
  • Experience with SGLang, vLLM, TensorRT, or similar inference frameworks
  • Strong understanding of GPU behavior
    • Memory limits, batching, fragmentation, utilization
  • Experience with:
    • Docker
    • Prometheus metrics
    • Structured logging
  • Ability to debug and fix real inference failures in production
  • Experience with autoscaling inference services
  • Familiarity with Kubernetes GPU scheduling
  • Experience supporting production systems with real SLAs
  • Proven ability to debug and fix inference failures in production
  • Comfortable operating in a fast-paced startup environment with high ownership

Preferred Qualifications

  • Experience with GPU-level optimization
    • Memory planning and reuse
    • Kernel launch efficiency
    • Reducing fragmentation and allocator overhead
  • Experience with kernel- or runtime-level optimization
    • CUDA kernels, Triton kernels, or custom ops
  • Experience with model-level inference optimization
    • Quantization (FP8 / INT8 / BF16)
    • KV-cache optimization
    • Speculative decoding or batching strategies
  • Experience pushing inference efficiency boundaries (latency, throughput, or cost)

Why Join Virtue AI

  • Competitive base salary compensation equity commensurate with skills and experience.
  • Impact at scale – Help define the category of AI security and partner with Fortune 500 enterprises on their most strategic AI initiatives.
  • Work on the frontier – Engage with bleeding-edge AI/ML and deploy AI security solutions for use cases that don't yet exist anywhere else yet.
  • Collaborative culture – Join a team of builders, problem-solvers, and innovators who are mission-driven and collaborative.
  • Opportunity for growth – Shape not only our customer engagements, but also the processes and culture of an early lean team with plans for scale.

Equal Opportunity Employment

Virtue AI is an Equal Opportunity Employer. We welcome and celebrate diversity and are committed to creating an inclusive workplace for all employees. Employment decisions are made without regard to race, color, religion, sex, gender identity or expression, sexual orientation, marital status, national origin, ancestry, age, disability, medical condition, veteran status, or any other status protected by law.

We also provide reasonable accommodations for applicants and employees with disabilities or sincerely held religious beliefs, consistent with legal requirements.

Salary.com Estimation for Software Engineering – Inference Engineer in San Francisco, CA
$124,263 to $150,489
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Software Engineering – Inference Engineer?

Sign up to receive alerts about other jobs on the Software Engineering – Inference Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$97,257 - $120,701
Income Estimation: 
$123,167 - $152,295
Income Estimation: 
$101,387 - $124,118
Income Estimation: 
$119,030 - $151,900
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Not the job you're looking for? Here are some other Software Engineering – Inference Engineer jobs in the San Francisco, CA area that may be a better fit.

  • Databricks San Francisco, CA
  • P-1284 About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize the inference engine that powers Databricks’... more
  • 9 Days Ago

  • OpenAI San Francisco, CA
  • About The Team Our Inference team brings OpenAI’s most capable research and technology to the world through our products. We empower consumers, enterprise ... more
  • 9 Days Ago

AI Assistant is available now!

Feel free to start your new journey!