Demo

Research Scientist / Engineer – Multimodal Capabilities

Luma AI
Palo Alto, CA Full Time
POSTED ON 1/4/2026
AVAILABLE BEFORE 3/15/2026
About Luma AI

Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable, and useful systems, the next step function change will come from vision. So we are working on training and scaling up multimodal foundation models for systems that can see and understand, show and explain, and eventually interact with our world to effect change.

Where You Come In

This is a high-impact opportunity to define the future of what our models can do. As a first-principles researcher, you will tackle the most ambitious questions at the heart of our mission: how can the fusion of vision, audio, and language unlock entirely new, magical behaviors in Al? You will not just be improving existing systems, you will be charting the course for the next generation of model capabilities, designing the core experiments that will shape the future of our technology and products.

What You'll Do

  • Research and Define the next frontier of multimodal capabilities, identifying key gaps in our current models and designing the experiments to solve them.
  • Design and Execute novel experiments, datasets, and methodologies to systematically improve model performance across vision, audio, and language.
  • Develop and Pioneer new evaluation frameworks and benchmarking approaches to precisely measure novel multimodal behaviors and capabilities.
  • Collaborate Deeply with other research teams to translate your findings into our core training recipes and unlock new product experiences.
  • Build and Prototype compelling demonstrations that showcase the groundbreaking multimodal capabilities you have unlocked.

Who You Are

We're seeking a first-principles researcher with a deep curiosity to push the boundaries of what AI can achieve.

  • You have a PhD or equivalent research experience in a field related to AI, Machine Learning, or Computer Science.
  • You have strong programming skills in Python and deep, hands-on experience with PyTorch.
  • You have a proven track record of working with multimodal data pipelines and curating large-scale datasets for research.
  • You possess a deep, fundamental understanding of at least one of the core modalities: computer vision, audio processing, or natural language processing.
  • You thrive on tackling the most ambitious, open-ended research challenges in a fast-paced, collaborative environment.

What Sets You Apart (Bonus Points)

  • Direct expertise working with complex, interleaved multimodal data (video, audio, text).
  • Hands-on experience training or fine-tuning Vision Language Models (VLMs), Audio Language Models, or large-scale generative video models from scratch.
  • A strong publication record in top-tier AI conferences (e.g., NeurIPS, ICML, CVPR, ICLR).
  • Experience leading ambitious, open-ended research projects from ideation to tangible results.

Salary.com Estimation for Research Scientist / Engineer – Multimodal Capabilities in Palo Alto, CA
$95,536 to $118,404
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Research Scientist / Engineer – Multimodal Capabilities?

Sign up to receive alerts about other jobs on the Research Scientist / Engineer – Multimodal Capabilities career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$53,054 - $70,103
Income Estimation: 
$62,307 - $82,426
Income Estimation: 
$64,451 - $83,138
Income Estimation: 
$74,029 - $94,382
Income Estimation: 
$74,029 - $94,382
Income Estimation: 
$91,459 - $117,736
Income Estimation: 
$91,459 - $117,736
Income Estimation: 
$96,123 - $134,937
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Luma AI

  • Luma AI Palo Alto, CA
  • The Opportunity Luma AI is a full-stack AI lab building Multimodal AGI. To truly understand the world, models must learn from audio, video, and images. We ... more
  • 15 Days Ago

  • Luma AI Palo Alto, CA
  • The Opportunity We believe that the next step function change in intelligence will come from vision. We have access to capital and compute resources necess... more
  • 15 Days Ago

  • Luma AI Palo Alto, CA
  • The Opportunity Luma AI operates at the intersection of research and product, backed by over $1.3 billion in funding. We are building the next era of AI wi... more
  • 15 Days Ago

  • Luma AI Palo Alto, CA
  • The Opportunity At Luma AI, we believe multimodality is critical for intelligence. We are a full-stack lab, training foundational models and building the p... more
  • 15 Days Ago


Not the job you're looking for? Here are some other Research Scientist / Engineer – Multimodal Capabilities jobs in the Palo Alto, CA area that may be a better fit.

  • Meta Menlo Park, CA
  • Meta was built to help people connect and share, and over the last decade our tools have played a critical part in changing how people around the world com... more
  • 25 Days Ago

  • Meta Menlo Park, CA
  • Reality Labs at Meta is building products that make it easier for people to connect with the ones they love most, enjoy top-notch, wire-free VR, and push t... more
  • 25 Days Ago

AI Assistant is available now!

Feel free to start your new journey!