Demo

Research Scientist / Engineer - Video Generation Modeling

Gigascale Capital
Palo Alto, CA Full Time
POSTED ON 5/25/2026
AVAILABLE BEFORE 6/24/2026
Location

Palo Alto

Employment Type

Full time

Department

Research

OverviewApplication

At Rhoda AI, we’re building the next generation of generalist intelligent robots. We own the full robotics stack from high-performance hardware and robot systems to the infrastructure and state-of-the-art foundation world models that control our robots. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling long-tail edge cases, made possible by our cutting edge research and end-to-end system design. We've raised over $400M and are investing aggressively in model research, infrastructure, hardware development, and manufacturing scale-up to make generalist robotics a reality.

We're looking for Research Scientists and Research Engineers to push the frontier of large-scale pre-training for our video action model. Our approach formulates robot control as video prediction — we pre-train causal video generation models on web-scale video data, then adapt them to predict robot actions from real-world demonstrations. You'll work on the core architectures, training objectives, and scaling strategies that determine how well our models learn from internet-scale video. We hire across levels — from senior to staff — and welcome both research-track and engineering-track candidates.

What You'll Do

  • Design and train large-scale causal video generation models on web-scale video data
  • Develop and validate training objectives, model architectures, and data mixtures for video prediction at scale
  • Research scaling laws and data efficiency for web-scale video pretraining
  • Investigate what properties of web video transfer most effectively to robotic control and action prediction
  • Build systematic evaluations to measure video generation quality, long-horizon prediction fidelity, and downstream robot task performance
  • Run rigorous ablations and benchmarking to understand what drives model quality at scale
  • Collaborate closely with data & evaluation, post-training, and training systems teams to translate research ideas into working systems
  • Publish and present work at top-tier ML and robotics venues (especially valued for RS track)

What We're Looking For

  • Strong background in large-scale generative modeling — either video generation (autoregressive video models, diffusion transformers, causal video architectures) or language model pretraining (LLMs, autoregressive transformers at scale)
  • Hands-on experience training large generative models from scratch at scale
  • Deep understanding of autoregressive modeling, causal architectures, and scaling behavior
  • Fluency with modern ML frameworks (PyTorch required; JAX a plus)
  • Ability to design experiments, interpret results, and iterate quickly
  • Strong research taste: ability to identify high-leverage questions and cut through noise
  • Comfort operating in a fast-moving, ambiguous startup environment
  • Staff-level candidates are expected to define technical direction and drive research strategy independently; senior/MTS candidates execute complex projects with strong fundamentals and growing scope

Nice To Have (But Not Required)

  • PhD in ML, CS, Robotics, or a related field — or equivalent research/industry experience
  • Strong publication record at NeurIPS, ICML, ICLR, CVPR, CoRL, etc. (especially valued for RS track)
  • Prior work specifically on video generation models (autoregressive video, diffusion transformers, world models, or causal video architectures)
  • Experience with large-scale autoregressive language model pretraining and scaling
  • Familiarity with web-scale video datasets and video data curation pipelines
  • Prior work connecting video generation to control, action prediction, or robotic learning
  • Familiarity with distributed training and multi-node infrastructure

Why This Role

  • Work on a fundamentally different approach to robot learning — web-scale video pretraining rather than robot-data-only VLA models
  • Your models give our robots the ability to understand and predict the visual world from internet-scale supervision
  • Direct collaboration with data, post-training, and deployment teams with no silos
  • High ownership and fast iteration in a small, elite team

At Rhoda AI, we’re building the next generation of generalist intelligent robots. We own the full robotics stack from high-performance hardware and robot systems to the infrastructure and state-of-the-art foundation world models that control our robots. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling long-tail edge cases, made possible by our cutting edge research and end-to-end system design. We've raised over $400M and are investing aggressively in model research, infrastructure, hardware development, and manufacturing scale-up to make generalist robotics a reality.

We're looking for Research Scientists and Research Engineers to push the frontier of large-scale pre-training for our video action model. Our approach formulates robot control as video prediction — we pre-train causal video generation models on web-scale video data, then adapt them to predict robot actions from real-world demonstrations. You'll work on the core architectures, training objectives, and scaling strategies that determine how well our models learn from internet-scale video. We hire across levels — from senior to staff — and welcome both research-track and engineering-track candidates.

What You'll Do

  • Design and train large-scale causal video generation models on web-scale video data
  • Develop and validate training objectives, model architectures, and data mixtures for video prediction at scale
  • Research scaling laws and data efficiency for web-scale video pretraining
  • Investigate what properties of web video transfer most effectively to robotic control and action prediction
  • Build systematic evaluations to measure video generation quality, long-horizon prediction fidelity, and downstream robot task performance
  • Run rigorous ablations and benchmarking to understand what drives model quality at scale
  • Collaborate closely with data & evaluation, post-training, and training systems teams to translate research ideas into working systems
  • Publish and present work at top-tier ML and robotics venues (especially valued for RS track)

What We're Looking For

  • Strong background in large-scale generative modeling — either video generation (autoregressive video models, diffusion transformers, causal video architectures) or language model pretraining (LLMs, autoregressive transformers at scale)
  • Hands-on experience training large generative models from scratch at scale
  • Deep understanding of autoregressive modeling, causal architectures, and scaling behavior
  • Fluency with modern ML frameworks (PyTorch required; JAX a plus)
  • Ability to design experiments, interpret results, and iterate quickly
  • Strong research taste: ability to identify high-leverage questions and cut through noise
  • Comfort operating in a fast-moving, ambiguous startup environment
  • Staff-level candidates are expected to define technical direction and drive research strategy independently; senior/MTS candidates execute complex projects with strong fundamentals and growing scope

Nice To Have (But Not Required)

  • PhD in ML, CS, Robotics, or a related field — or equivalent research/industry experience
  • Strong publication record at NeurIPS, ICML, ICLR, CVPR, CoRL, etc. (especially valued for RS track)
  • Prior work specifically on video generation models (autoregressive video, diffusion transformers, world models, or causal video architectures)
  • Experience with large-scale autoregressive language model pretraining and scaling
  • Familiarity with web-scale video datasets and video data curation pipelines
  • Prior work connecting video generation to control, action prediction, or robotic learning
  • Familiarity with distributed training and multi-node infrastructure

Why This Role

  • Work on a fundamentally different approach to robot learning — web-scale video pretraining rather than robot-data-only VLA models
  • Your models give our robots the ability to understand and predict the visual world from internet-scale supervision
  • Direct collaboration with data, post-training, and deployment teams with no silos
  • High ownership and fast iteration in a small, elite team

At Rhoda AI, we’re building the next generation of generalist intelligent robots. We own the full robotics stack from high-performance hardware and robot systems to the infrastructure and state-of-the-art foundation world models that control our robots. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling long-tail edge cases, made possible by our cutting edge research and end-to-end system design. We've raised over $400M and are investing aggressively in model research, infrastructure, hardware development, and manufacturing scale-up to make generalist robotics a reality.

We're looking for Research Scientists and Research Engineers to push the frontier of large-scale pre-training for our video action model. Our approach formulates robot control as video prediction — we pre-train causal video generation models on web-scale video data, then adapt them to predict robot actions from real-world demonstrations. You'll work on the core architectures, training objectives, and scaling strategies that determine how well our models learn from internet-scale video. We hire across levels — from senior to staff — and welcome both research-track and engineering-track candidates.

What You'll Do

  • Design and train large-scale causal video generation models on web-scale video data
  • Develop and validate training objectives, model architectures, and data mixtures for video prediction at scale
  • Research scaling laws and data efficiency for web-scale video pretraining
  • Investigate what properties of web video transfer most effectively to robotic control and action prediction
  • Build systematic evaluations to measure video generation quality, long-horizon prediction fidelity, and downstream robot task performance
  • Run rigorous ablations and benchmarking to understand what drives model quality at scale
  • Collaborate closely with data & evaluation, post-training, and training systems teams to translate research ideas into working systems
  • Publish and present work at top-tier ML and robotics venues (especially valued for RS track)

What We're Looking For

  • Strong background in large-scale generative modeling — either video generation (autoregressive video models, diffusion transformers, causal video architectures) or language model pretraining (LLMs, autoregressive transformers at scale)
  • Hands-on experience training large generative models from scratch at scale
  • Deep understanding of autoregressive modeling, causal architectures, and scaling behavior
  • Fluency with modern ML frameworks (PyTorch required; JAX a plus)
  • Ability to design experiments, interpret results, and iterate quickly
  • Strong research taste: ability to identify high-leverage questions and cut through noise
  • Comfort operating in a fast-moving, ambiguous startup environment
  • Staff-level candidates are expected to define technical direction and drive research strategy independently; senior/MTS candidates execute complex projects with strong fundamentals and growing scope

Nice To Have (But Not Required)

  • PhD in ML, CS, Robotics, or a related field — or equivalent research/industry experience
  • Strong publication record at NeurIPS, ICML, ICLR, CVPR, CoRL, etc. (especially valued for RS track)
  • Prior work specifically on video generation models (autoregressive video, diffusion transformers, world models, or causal video architectures)
  • Experience with large-scale autoregressive language model pretraining and scaling
  • Familiarity with web-scale video datasets and video data curation pipelines
  • Prior work connecting video generation to control, action prediction, or robotic learning
  • Familiarity with distributed training and multi-node infrastructure

Why This Role

  • Work on a fundamentally different approach to robot learning — web-scale video pretraining rather than robot-data-only VLA models
  • Your models give our robots the ability to understand and predict the visual world from internet-scale supervision
  • Direct collaboration with data, post-training, and deployment teams with no silos
  • High ownership and fast iteration in a small, elite team

Salary.com Estimation for Research Scientist / Engineer - Video Generation Modeling in Palo Alto, CA
$140,582 to $175,820
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Research Scientist / Engineer - Video Generation Modeling?

Sign up to receive alerts about other jobs on the Research Scientist / Engineer - Video Generation Modeling career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$108,245 - $136,486
Income Estimation: 
$136,683 - $171,343
Income Estimation: 
$82,813 - $108,410
Income Estimation: 
$120,989 - $162,093
Income Estimation: 
$74,806 - $91,633
Income Estimation: 
$71,928 - $87,026
Income Estimation: 
$145,337 - $174,569
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Gigascale Capital

  • Gigascale Capital Colorado, CO
  • Colorado Field Operations – Drilling Operations / On-site apply for this job Company Bedrock Energy is on a mission to transform the heating and cooling of... more
  • Just Posted

  • Gigascale Capital Palo Alto, CA
  • Location Palo Alto Employment Type Full time Department Research OverviewApplication At Rhoda AI, we’re building the next generation of generalist intellig... more
  • 1 Day Ago

  • Gigascale Capital Berkeley, CA
  • Privacy Overview This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorize... more
  • 1 Day Ago

  • Gigascale Capital El Segundo, CA
  • Location El Segundo, CA Employment Type Full time Location Type On-site Department OperationsBusiness Development Compensation $115K – $168K Offers Equity ... more
  • 2 Days Ago


Not the job you're looking for? Here are some other Research Scientist / Engineer - Video Generation Modeling jobs in the Palo Alto, CA area that may be a better fit.

  • Luma AI Palo Alto, CA
  • About Luma AI Luma’s mission is to build multimodal AGI. Through our research on video, 3D, and now multimodal models at Luma, we believe that AI needs to ... more
  • 1 Day Ago

  • Gigascale Capital Palo Alto, CA
  • Location Palo Alto Employment Type Full time Department Research OverviewApplication At Rhoda AI, we’re building the next generation of generalist intellig... more
  • 1 Day Ago

AI Assistant is available now!

Feel free to start your new journey!