Demo

Senior Software Engineer, Vision Language Models

Motional
Boston, MA Full Time
POSTED ON 12/17/2025
AVAILABLE BEFORE 2/12/2026

Mission Summary

At Motional, data play a critical role in fueling our ML-centered autonomous driving vehicle. Our robo-taxi fleet collects petabytes of data on the road every day – the Data Mining team is mining & filtering the massive influx of fleet data by developing billion-scale data workflows and state-of-the-art mining algorithms. Through our mining and learning frameworks we continuously improve the on-road performance of ML products for perception, prediction & planning with every mile driven.

We mine for model errors, anomalies, rare objects & long-tail driving scenarios across millions of driving hours – these are used for laser-focused ML model training and continuous edge case validation. We are looking for an engineer to spearhead new mining strategies & workflows and help deliver high-quality data that improve our core ML products.

What you'll be doing:

  • Spearhead the development of cutting-edge data products by adapting and extending Vision-Language Models (VLMs) and other multimodal foundation models. This includes applying advanced techniques like fine-tuning, RAG, in-context learning, continual pre-training, and knowledge distillation.
  • Design and curate high-quality multimodal datasets crucial for training and evaluating multimodal foundation models. This includes developing innovative strategies for data curation, dataset creation, and synthetic data generation to optimize multimodal foundation models for long-tail event mining.
  • Drive the in-depth analysis of multimodal foundation models' performance, generalization, and robustness in diverse real-world settings

What we're looking for:

  • MS/PhD in computer science or related fields with a strong emphasis on multimodal foundation models
  • Strong publication record in premier conferences (e.g., CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR) demonstrating significant contributions to the field of vision-language understanding or multimodal foundation models
  • Proficiency in Python and deep learning frameworks such as PyTorch, with a demonstrated ability to write clean, efficient, and maintainable code

Bonus points (not required):

  • Experience in the application of Vision-Language Models (VLMs) or other multimodal foundation models to data mining in real-world settings
  • Experience in production deployment of Vision-Language Models (VLMs) or other multimodal foundation models for real-world applications (e.g., image/video captioning, open-vocabulary image/video searching)
  • Experience with data from diverse sensor modalities (e.g., camera, lidar, radar)
  • Experience in applied machine learning for autonomous driving

Motional is a driverless technology company making autonomous vehicles a safe, reliable, and accessible reality. We're driven by something more.

Our journey is always people first.

We aren't just developing driverless cars; we're creating safer roadways, more equitable transportation options, and making our communities better places to live, work, and connect. Our team is made up of engineers, researchers, innovators, dreamers and doers, who are creating a technology with the potential to transform the way we move.

Higher purpose, greater impact.

We're creating first-of-its-kind technology that will transform transportation. To do so successfully, we must design for everyone in our cities and on our roads. We believe in building a great place to work through a progressive, global culture that is diverse, inclusive, and ensures people feel valued at every level of the organization. Diversity helps us to see the world differently; it's not only good for our business, it's the right thing to do.

Scale up, not starting up.

Our team is behind some of the industry's largest leaps forward, including the first fully-autonomous cross-country drive in the U.S, the launch of the world's first robotaxi pilot, and operation of the world's longest-standing public robotaxi fleet. We're driven to scale; we're moving towards commercialization of our technology, and we need team members who are ready to embrace change and challenges.

Formed as a joint venture between Hyundai Motor Group and Aptiv, Motional is fundamentally changing how people move through their lives. Headquartered in Boston, Motional has operations in the U.S and Asia. For more information, visit www.Motional.com and follow us on Twitter, LinkedIn, Instagram and YouTube.

Motional AD Inc. is an EOE. We celebrate diversity and are committed to creating an inclusive environment for all employees. To comply with Federal Law, we participate in E-Verify. All newly-hired employees are queried through this electronic system established by the DHS and the SSA to verify their identity and employment eligibility.

Salary : $175,000 - $234,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Senior Software Engineer, Vision Language Models?

Sign up to receive alerts about other jobs on the Senior Software Engineer, Vision Language Models career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$123,167 - $152,295
Income Estimation: 
$146,673 - $180,130
Income Estimation: 
$97,257 - $120,701
Income Estimation: 
$123,167 - $152,295
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Motional

  • Motional Las Vegas, NV
  • Mission Summary: We are looking for an organized and detail-oriented Equity Administrator to manage and scale our private equity program as we grow. This r... more
  • 14 Days Ago

  • Motional Boston, MA
  • Mission Summary The AI Data Foundry is the vital link between our company's expansive data and the engineering teams that rely on it. We architect and oper... more
  • 14 Days Ago

  • Motional Pittsburgh, PA
  • Mission Summary Our Dev Tooling team is a diverse group of software engineers creating an innovative software framework. This framework (developed in C ) e... more
  • 14 Days Ago

  • Motional Pittsburgh, PA
  • Motional's Pittsburgh office is located in the new Hazelwood Green development, a culmination of the city's goal of restoring an economic driver to the nei... more
  • 14 Days Ago


Not the job you're looking for? Here are some other Senior Software Engineer, Vision Language Models jobs in the Boston, MA area that may be a better fit.

  • Mitsubishi Electric Research Laboratories Cambridge, MA
  • MERL is looking for research interns to conduct research into building and training novel architectures for small (~1 billion parameters) vision language m... more
  • 1 Day Ago

  • Siemens EDA (Siemens Digital Industries Software) Waltham, MA
  • Company: Siemens EDA Job Title: Senior Software Engineer - Compiler Job Reference #: 468724 Job Location: Waltham, MA Siemens EDA is a global technology le... more
  • 14 Days Ago

AI Assistant is available now!

Feel free to start your new journey!