Demo

Solutions Architect, Inference Deployments

NVIDIA AI
Santa Clara, CA Full Time
POSTED ON 4/26/2026
AVAILABLE BEFORE 5/24/2026
Job Requisition ID

JR2014105

Job Category

Sales

Time Type

Full time

We’re forming a team of innovators to roll out and enhance AI inference solutions at scale, demonstrating NVIDIA’s GPU technology and Kubernetes. As a Solutions Architect focused on inference, you’ll collaborate closely with our engineering, DevOps, and customers to develop enterprise AI solutions. Together, we'll deliver generative AI to production!

What You'll Be Doing

  • Build inference pipelines with tools like NVIDIA Dynamo, distributing tasks among GPU workers to improve efficiency.
  • Collaborate with DevOps teams to orchestrate disaggregated inference using Kubernetes for complex workloads.
  • Accelerate inference pipelines using TensorRT-LLM, vLLM, SGLang, and other backends to ensure seamless integration with disaggregated inference.
  • Provide mentorship and technical leadership to customers and internal teams, guiding them through the deployment of disaggregated inference systems and resolving complex issues.

What We Need To See

  • 5 Years in Solutions Architecture with a proven track record of deploying distributed systems and AI inference workloads on Kubernetes.
  • Experience with one of NVIDIA Dynamo, Triton Inference Server, or TensorRT-LLM for model optimization and serving.
  • GPU orchestration using NVIDIA GPU Operator, NIM Operator, and Multi-Instance GPU (MIG) partitioning.
  • Solving sophisticated GPU allocation, memory hierarchies (HBM, DRAM, SSD), and low-latency networking (RDMA, UCX).
  • Demonstrated success in tuning large language models for low-latency inference in enterprise environments.
  • BS in CS/Engineering or equivalent experience.

Ways To Stand Out From The Crowd

  • Prior experience deploying NVIDIA inference technologies such as Dynamo, NIM, NIXL and Grove.
  • Deep understanding of transformer neural network, and inference acceleration technologies like quantization, speculative decoding, WideEP etc.
  • NVIDIA Certified AI Engineer or similar credentials.
  • Contributions to open-source projects including NVIDIA Dynamo, vLLM, KServe, or SGLang.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until April 19, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Salary.com Estimation for Solutions Architect, Inference Deployments in Santa Clara, CA
$121,591 to $154,979
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Solutions Architect, Inference Deployments?

Sign up to receive alerts about other jobs on the Solutions Architect, Inference Deployments career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$103,228 - $139,671
Income Estimation: 
$116,726 - $151,072
Income Estimation: 
$124,724 - $161,246
Income Estimation: 
$89,966 - $112,616
Income Estimation: 
$118,163 - $145,996
Income Estimation: 
$120,777 - $151,022
Income Estimation: 
$129,363 - $167,316
Income Estimation: 
$86,891 - $130,303
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at NVIDIA AI

  • NVIDIA AI Washington, DC
  • Job Requisition ID JR2016354 Job Category Sales Time Type Full time We are seeking an Account Manager with a proven track record of leading successful Supe... more
  • 1 Day Ago

  • NVIDIA AI Redmond, WA
  • Job Requisition ID JR2006966 Job Category Engineering Time Type Full time NVIDIA is looking for outstanding software engineers to help us expand our enterp... more
  • 1 Day Ago

  • NVIDIA AI Redmond, WA
  • Job Requisition ID JR2012411 Job Category Engineering Time Type Full time We are looking for a Senior Deep Learning Software Engineer to design and build o... more
  • 1 Day Ago

  • NVIDIA AI Redmond, WA
  • Job Requisition ID JR2015886 Job Category Engineering Time Type Full time NVIDIA is searching for a highly motivated, creative engineer to join the GPU Sof... more
  • 1 Day Ago


Not the job you're looking for? Here are some other Solutions Architect, Inference Deployments jobs in the Santa Clara, CA area that may be a better fit.

  • NVIDIA AI Santa Clara, CA
  • Job Requisition ID JR2016133 Job Category Sales Time Type Full time NVIDIA is looking for an ambitious and forward-thinking solution architect to help in t... more
  • 2 Days Ago

  • Architect Palo Alto, CA
  • What You'll Do As a Research Intern at Architect, you will spend 3 months working alongside the founding team to push the boundaries of how AI models explo... more
  • 16 Days Ago

AI Assistant is available now!

Feel free to start your new journey!