Demo

AI Inference Engineer - Model Optimization & Deployment

Zoox
San Diego, CA Full Time
POSTED ON 4/14/2026
AVAILABLE BEFORE 6/12/2026

The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence.


As a Model Optimization & Deployment Engineer, you will focus on bringing highly efficient, production-ready large-scale models to our on-vehicle stack. We are looking for experts with hands-on experience in compressing, accelerating, and deploying complex models (LLMs, VLMs, or FMs) for power- and thermal-constrained vehicle SOCs. You will optimize the ML models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge devices.

\n


In this role, you will:
  • Optimize large-scale models (LLMs, VLMs) using advanced quantization (PTQ, QAT), mixed-precision inference workflows, and parameter-efficient fine-tuning (LoRA, QLoRA).
  • Architect and implement model conversion and compilation pipelines using TensorRT and TensorRT-LLM for edge deployment.

  • Perform rigorous parity checking, accuracy recovery, and latency benchmarking between PyTorch frameworks and compiled edge binaries.

  • Write and optimize custom CUDA kernels and TensorRT Plugins to maximize memory bandwidth and minimize latency on AI accelerators.

  • Write production-level, highly concurrent, and memory-safe C and Python code for real-time inference on vehicle SOCs.



Qualifications:
  • Deep expertise in model quantization (PTQ, QAT) and mixed-precision inference workflows (INT8, FP8, INT4, BF16/FP16).

  • Proven experience optimizing large-scale models (LLMs, VLMs, or VLAs) utilizing KV-cache optimization (e.g., PagedAttention), Speculative Decoding, and Efficient Attention mechanisms (FlashAttention, Linear Attention).

  • Extensive experience with model conversion/compilation pipelines (TensorRT, TensorRT-LLM) and performing rigorous parity/latency benchmarking.

  • Proficiency in low-level programming for AI accelerators, specifically writing and optimizing custom CUDA kernels and TensorRT Plugins.

  • Production-level C (14/17/20) and Python programming skills, with experience writing concurrent, memory-safe, real-time inference code for edge devices.


Bonus Qualifications:
  • Experience with distributed training pipelines and model/tensor parallelism (PyTorch Distributed, Ray, DeepSpeed, Megatron-LM) and runtime efficiency optimization for GPU clusters.

  • Familiarity with autonomous driving perception stacks (temporal 3D object detection, BEV, 3D Occupancy Networks) and processing multi-modal sensor streams (Vision, LiDAR, Radar).

  • Understanding of end-to-end autonomous driving paradigms (VLA models, closed-loop simulation validation).


\n
$242,000 - $290,000 a year
Base Salary Range
 
There are three major components to compensation for this position: salary, Amazon Restricted Stock Units (RSUs), and Zoox Stock Appreciation Rights. A sign-on bonus may be offered as part of the compensation package. The listed range applies only to the base salary. Compensation will vary based on geographic location and level. Leveling, as well as positioning within a level, is determined by a range of factors, including, but not limited to, a candidate's relevant years of experience, domain knowledge, and interview performance. The salary range listed in this posting is representative of the range of levels Zoox is considering for this position.
 
Zoox also offers a comprehensive package of benefits, including paid time off (e.g. sick leave, vacation, bereavement), unpaid time off, Zoox Stock Appreciation Rights, Amazon RSUs, health insurance, long-term care insurance, long-term and short-term disability insurance, and life insurance.
\n

About Zoox

Zoox is developing the first ground-up, fully autonomous vehicle fleet and the supporting ecosystem required to bring this technology to market. Sitting at the intersection of robotics, machine learning, and design, Zoox aims to provide the next generation of mobility-as-a-service in urban environments. We’re looking for top talent that shares our passion and wants to be part of a fast-moving and highly execution-oriented team.


Follow us on LinkedIn


Accommodations

If you need an accommodation to participate in the application or interview process please reach out to accommodations@zoox.com or your assigned recruiter.


A Final Note:

You do not need to match every listed expectation to apply for this position. Here at Zoox, we know that diverse perspectives foster the innovation we need to be successful, and we are committed to building a team that encompasses a variety of backgrounds, experiences, and skills.

Salary : $242,000 - $290,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a AI Inference Engineer - Model Optimization & Deployment?

Sign up to receive alerts about other jobs on the AI Inference Engineer - Model Optimization & Deployment career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$101,387 - $124,118
Income Estimation: 
$119,030 - $151,900
Income Estimation: 
$101,387 - $124,118
Income Estimation: 
$119,030 - $151,900
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Zoox

  • Zoox Foster, CA
  • As Strategic Sourcing Manager of Direct Procurement, you will be responsible for Zoox’s strategic sourcing and industry partnerships, which includes workin... more
  • 12 Days Ago

  • Zoox Foster, CA
  • Zoox is seeking an experienced Technical Recruiting Manager to lead one of our teams focused on hiring engineering talent. In this role, you will partner w... more
  • 12 Days Ago

  • Zoox Foster, CA
  • Do you enjoy creating intuitive tools that make complicated systems easy to understand and use? Are you interested in applying your skills to build tools f... more
  • 12 Days Ago

  • Zoox Foster, CA
  • Zoox's Advanced Hardware Engineering team is responsible for the design, development, and innovation of sensor technology and the compute platform for our ... more
  • 12 Days Ago


Not the job you're looking for? Here are some other AI Inference Engineer - Model Optimization & Deployment jobs in the San Diego, CA area that may be a better fit.

  • Qualcomm Technologies San Diego, CA
  • Company: Qualcomm Technologies, Inc. Job Area: Engineering Group, Engineering Group > Machine Learning Engineering General Summary: Qualcomm is leveraging ... more
  • 5 Days Ago

  • Qualcomm Technologies San Diego, CA
  • Company: Qualcomm Technologies, Inc. Job Area: Engineering Group, Engineering Group > Machine Learning Engineering General Summary: About Qualcomm Robotics... more
  • 11 Days Ago

AI Assistant is available now!

Feel free to start your new journey!