Demo

Machine Learning Engineer - Multi-Modality Foundation Model

Zoox
Foster, CA Full Time
POSTED ON 4/26/2026
AVAILABLE BEFORE 6/10/2026
The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence. As a Multi-modality Foundation Model Engineer, you will focus on building highly efficient, production-ready multi-modality models. We are looking for experts who have hands-on experience building multi-modality foundation models—whether that involves AV-centric modalities (Vision, LiDAR, Radar) or broader domains (Vision, Language, Text, Audio). You will design, train, and deploy these models using Knowledge Distillation (KD) to transfer capabilities from large-scale proprietary teacher models to efficient student models capable of real-time, on-vehicle inference.

In This Role, You Will

  • Build, pre-train, and evaluate large-scale multi-modality foundation models from the ground up, successfully aligning diverse data streams (e.g., Vision, LiDAR, Radar, Language, Audio).
  • Define and execute the ML roadmap for deploying these multi-modality representations to the vehicle.
  • Architect and implement Knowledge Distillation pipelines to compress large-capacity multi-modal teacher models into highly efficient, production-ready student models.
  • Build high-quality training and evaluation datasets, applying advanced data-centric techniques to maximize cross-modal representation learning and student model convergence.
  • Collaborate with downstream perception teams to integrate and validate the performance, robustness, and latency of your models in on-board production systems.

Qualifications

  • MS or PhD in Computer Science, Machine Learning, or a related technical field with demonstrated professional experience.
  • Deep, proven expertise in building and training large-scale multi-modality foundation models (e.g., Vision-Language Models (VLMs), Vision-Audio-Text, or Vision-LiDAR-Radar architectures).
  • Strong understanding of cross-modal alignment, multi-modal attention mechanisms, and large-scale pre-training techniques.
  • Proven experience in Knowledge Distillation (KD), model compression, and training highly efficient student models for production environments.
  • Proficiency in ML frameworks (e.g., PyTorch) and experience building large-scale ML training and evaluation pipelines.

Bonus Qualifications

  • Experience in the Autonomous Driving or robotics industry.
  • Experience with model deployment, optimization, and hardware constraints (e.g., C for inference, TensorRT, quantization, pruning).
  • Publications in top-tier conferences (CVPR, ICCV, NeurIPS, ICLR, ACL) related to multi-modality foundation models, cross-modal learning, or model compression.

$189,000 - $258,000 a year

Base Salary Range

There are three major components to compensation for this position: salary, Amazon Restricted Stock Units (RSUs), and Zoox Stock Appreciation Rights. A sign-on bonus may be offered as part of the compensation package. The listed range applies only to the base salary. Compensation will vary based on geographic location and level. Leveling, as well as positioning within a level, is determined by a range of factors, including, but not limited to, a candidate's relevant years of experience, domain knowledge, and interview performance. The salary range listed in this posting is representative of the range of levels Zoox is considering for this position.

Zoox also offers a comprehensive package of benefits, including paid time off (e.g. sick leave, vacation, bereavement), unpaid time off, Zoox Stock Appreciation Rights, Amazon RSUs, health insurance, long-term care insurance, long-term and short-term disability insurance, and life insurance.

About Zoox

Zoox is developing the first ground-up, fully autonomous vehicle fleet and the supporting ecosystem required to bring this technology to market. Sitting at the intersection of robotics, machine learning, and design, Zoox aims to provide the next generation of mobility-as-a-service in urban environments. We’re looking for top talent that shares our passion and wants to be part of a fast-moving and highly execution-oriented team.

Follow us on LinkedIn

Accommodations

If you need an accommodation to participate in the application or interview process please reach out to [email protected] or your assigned recruiter.

A Final Note

You do not need to match every listed expectation to apply for this position. Here at Zoox, we know that diverse perspectives foster the innovation we need to be successful, and we are committed to building a team that encompasses a variety of backgrounds, experiences, and skills.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Salary : $189,000 - $258,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Machine Learning Engineer - Multi-Modality Foundation Model?

Sign up to receive alerts about other jobs on the Machine Learning Engineer - Multi-Modality Foundation Model career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$123,167 - $152,295
Income Estimation: 
$146,673 - $180,130
Income Estimation: 
$101,387 - $124,118
Income Estimation: 
$119,030 - $151,900
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Zoox

  • Zoox Foster, CA
  • The Perception Scene Understanding team at Zoox builds the high-performance reasoning engines that allow our autonomous vehicles to navigate complex urban ... more
  • 1 Day Ago

  • Zoox Foster, CA
  • The Senior SAP EWM Analyst owns the implementation and optimization of SAP Extended Warehouse Management (EWM) on SAP S/4HANA, partnering with manufacturin... more
  • 1 Day Ago

  • Zoox Foster, CA
  • The Master Production Scheduler is responsible for planning the build schedule and product mix to meet demand requirements. The Master Scheduler will evalu... more
  • 1 Day Ago

  • Zoox Foster, CA
  • Zoox's Experience Team works across the organization to define and champion the Human Experience. We embrace a human-centered design process that is collab... more
  • 2 Days Ago


Not the job you're looking for? Here are some other Machine Learning Engineer - Multi-Modality Foundation Model jobs in the Foster, CA area that may be a better fit.

  • Zoox Boston, MA
  • The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence. As a... more
  • 1 Month Ago

  • didi San Jose, CA
  • About The Company DiDi's autonomous driving unit was established in 2016 with the mission of developing Level 4 autonomous driving (AD) technology to make ... more
  • 1 Month Ago

AI Assistant is available now!

Feel free to start your new journey!