Demo

Research Scientist / Engineer — Foundation Model (Voice Agents)

Luma
Palo Alto, CA Full Time
POSTED ON 4/10/2026
AVAILABLE BEFORE 5/27/2026
About Luma AI

Luma’s mission is to build multimodal AGI. Through our research on video, 3D, and now multimodal models at Luma, we believe that AI needs to be jointly trained over all signal modalities – text, video, audio, images – analogous to the human brain.

To advance our mission, we build and operate the full stack end-to-end, spanning foundation models, inference systems, and products. This integrated approach powers technologies like Ray3, which is seeing rapidly growing adoption among Fortune 500 companies across media, entertainment, and advertising. Backed by a recent $900M Series C and our partnership with Humain to build a 2 GW compute supercluster (Project Halo), our models and the Dream Machine platform are now enabling creatives worldwide to tell some of the most impactful stories of our time.

Where You Come In

This is a rare opportunity to work at the absolute frontier of creative AI, building the next generation of interactive voice agents. You will join a foundational team responsible for developing the core models that allow humans to converse with AI in real-time with unprecedented realism and expressiveness. Your work will bridge the gap between deep research and magical, shipped products that millions of users will interact with.

What You'll Do

This opportunity involves both the “science” and “engineering” parts of research.

This is a multi-stack opportunity where you will work on the intersection of modeling, data, systems, and evaluation.

  • Modeling: Build next-generation voice agents that tightly integrate audio understanding (e.g., ASR, diarization, emotion recognition) and audio generation (e.g., TTS, voice conversion) for real-time, interactive use.
  • Data: Design, implement, and run robust data pipelines and training curricula for speech and audio, including large-scale pretraining, fine-tuning, and data quality iteration.
  • Systems: Train large-scale video and audio generative models on massive datasets and GPU clusters, and develop low-latency architectures and inference strategies for streaming, conversational, and on-device deployment.
  • Evaluation: Define and build novel evaluation frameworks for voice agents, covering accuracy, robustness, latency, controllability, and human perceptual quality.

Who You Are

  • A strong background in machine learning and generative modeling.
  • Practical understanding of speech and audio modeling, including representation learning, sequence modeling, and conditioning/control mechanisms.
  • Experience building and training models in PyTorch, including large-scale or latency-sensitive systems.

What Sets You Apart (Bonus Points)

  • Experience with speech or audio understanding tasks (e.g., ASR, diarization, speaker/emotion recognition, audio classification).
  • Experience with speech or audio generation (e.g., TTS, voice conversion, expressive or controllable speech).
  • Familiarity with streaming or real-time inference, model compression, or deployment on consumer hardware.
  • A portfolio of past projects, publications, or open-source contributions demonstrating your work in generative audio or speech AI.

Your application are reviewed by real people.

Salary.com Estimation for Research Scientist / Engineer — Foundation Model (Voice Agents) in Palo Alto, CA
$120,608 to $152,432
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Luma

  • Luma International Falls, MN
  • About Luma Luma’s mission is to build unified general intelligence that can generate, understand, and operate in the physical world. We believe that multim... more
  • 10 Days Ago

  • Luma Palo Alto, CA
  • About Luma AI Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intellig... more
  • 14 Days Ago

  • Luma Palo Alto, CA
  • About Luma Luma’s mission is to build unified general intelligence that can generate, understand, and operate in the physical world. We believe that multim... more
  • 15 Days Ago

  • Luma Palo Alto, CA
  • About Luma AI Luma’s mission is to build multimodal AGI. Through our research on video, 3D, and now multimodal models at Luma, we believe that AI needs to ... more
  • 16 Days Ago


Not the job you're looking for? Here are some other Research Scientist / Engineer — Foundation Model (Voice Agents) jobs in the Palo Alto, CA area that may be a better fit.

  • Luma Palo Alto, CA
  • Where You Come In This is a rare and foundational opportunity to define the future of multimodal AI. You will be at the forefront of architecting the intel... more
  • 1 Day Ago

  • Luma Palo Alto, CA
  • About Luma AI Luma’s mission is to build multimodal AGI. Through our research on video, 3D, and now multimodal models at Luma, we believe that AI needs to ... more
  • 8 Days Ago

AI Assistant is available now!

Feel free to start your new journey!