What are the responsibilities and job description for the Senior Software Engineer, Vision Language Models position at Motional?

Mission Summary

At Motional, data play a critical role in fueling our ML-centered autonomous driving vehicle. Our robo-taxi fleet collects petabytes of data on the road every day – the Data Mining team is mining & filtering the massive influx of fleet data by developing billion-scale data workflows and state-of-the-art mining algorithms. Through our mining and learning frameworks we continuously improve the on-road performance of ML products for perception, prediction & planning with every mile driven.

We mine for model errors, anomalies, rare objects & long-tail driving scenarios across millions of driving hours – these are used for laser-focused ML model training and continuous edge case validation. We are looking for an engineer to spearhead new mining strategies & workflows and help deliver high-quality data that improve our core ML products.

What you'll be doing:

Spearhead the development of cutting-edge data products by adapting and extending Vision-Language Models (VLMs) and other multimodal foundation models. This includes applying advanced techniques like fine-tuning, RAG, in-context learning, continual pre-training, and knowledge distillation.
Design and curate high-quality multimodal datasets crucial for training and evaluating multimodal foundation models. This includes developing innovative strategies for data curation, dataset creation, and synthetic data generation to optimize multimodal foundation models for long-tail event mining.
Drive the in-depth analysis of multimodal foundation models' performance, generalization, and robustness in diverse real-world settings

What we're looking for:

MS/PhD in computer science or related fields with a strong emphasis on multimodal foundation models
Strong publication record in premier conferences (e.g., CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR) demonstrating significant contributions to the field of vision-language understanding or multimodal foundation models
Proficiency in Python and deep learning frameworks such as PyTorch, with a demonstrated ability to write clean, efficient, and maintainable code

Bonus points (not required):

Experience in the application of Vision-Language Models (VLMs) or other multimodal foundation models to data mining in real-world settings
Experience in production deployment of Vision-Language Models (VLMs) or other multimodal foundation models for real-world applications (e.g., image/video captioning, open-vocabulary image/video searching)
Experience with data from diverse sensor modalities (e.g., camera, lidar, radar)
Experience in applied machine learning for autonomous driving

Motional is a driverless technology company making autonomous vehicles a safe, reliable, and accessible reality. We're driven by something more.

Our journey is always people first.

We aren't just developing driverless cars; we're creating safer roadways, more equitable transportation options, and making our communities better places to live, work, and connect. Our team is made up of engineers, researchers, innovators, dreamers and doers, who are creating a technology with the potential to transform the way we move.

Higher purpose, greater impact.

We're creating first-of-its-kind technology that will transform transportation. To do so successfully, we must design for everyone in our cities and on our roads. We believe in building a great place to work through a progressive, global culture that is diverse, inclusive, and ensures people feel valued at every level of the organization. Diversity helps us to see the world differently; it's not only good for our business, it's the right thing to do.

Scale up, not starting up.

Our team is behind some of the industry's largest leaps forward, including the first fully-autonomous cross-country drive in the U.S, the launch of the world's first robotaxi pilot, and operation of the world's longest-standing public robotaxi fleet. We're driven to scale; we're moving towards commercialization of our technology, and we need team members who are ready to embrace change and challenges.

Formed as a joint venture between Hyundai Motor Group and Aptiv, Motional is fundamentally changing how people move through their lives. Headquartered in Boston, Motional has operations in the U.S and Asia. For more information, visit www.Motional.com and follow us on Twitter, LinkedIn, Instagram and YouTube.

Motional AD Inc. is an EOE. We celebrate diversity and are committed to creating an inclusive environment for all employees. To comply with Federal Law, we participate in E-Verify. All newly-hired employees are queried through this electronic system established by the DHS and the SSA to verify their identity and employment eligibility.

Salary : $175,000 - $234,000

Apply for this job

Receive alerts for other Senior Software Engineer, Vision Language Models job openings

Senior Software Engineer, Vision Language Models

What are the responsibilities and job description for the Senior Software Engineer, Vision Language Models position at Motional?

What is the career path for a Senior Software Engineer, Vision Language Models?

Job openings at Motional

Not the job you're looking for? Here are some other Senior Software Engineer, Vision Language Models jobs in the Boston, MA area that may be a better fit.

We don't have any other Senior Software Engineer, Vision Language Models jobs in the Boston, MA area right now.

AI Assistant is available now!