Demo

Member of Technical Staff (MTS) - Multimodal Foundation Models

Deeproute.ai
Fremont, CA Full Time
POSTED ON 5/28/2026
AVAILABLE BEFORE 7/28/2026

Focus

Multimodal Foundation Models · Representation Learning · Method Innovation

We are looking for strong technical builders and researchers who deeply understand foundation models and representation learning beyond simply applying existing frameworks.

Ideal candidates should have:

  • Strong experimental rigor
  • Solid systems and modeling intuition
  • Hands-on engineering ability
  • Interest in scalable multimodal AI systems for real-world autonomy

We value people who can bridge research and production, and who care about robustness, scalability, efficiency, and practical deployment in large-scale autonomous driving systems.

Responsibilities

1. Large-Scale Foundation Model Pretraining

  • Develop scalable pretraining pipelines for large-scale multimodal driving data
  • Design and optimize training strategies for:
      • Vision-language-action models
      • Video foundation models
      • Long-context temporal modeling
      • Multimodal representation alignment
  • Improve:
    • Training stability
    • Data efficiency
    • Scaling efficiency
    • Representation robustness
  • Work on distributed training systems and large-scale model optimization using frameworks such as:
    • PyTorch Distributed
    • DeepSpeed
    • Megatron-LM

2. Representation Learning & Method Innovation

  • Design and improve self-supervised and multimodal learning methods for real-world autonomous driving systems
  • Conduct architecture-level research on:
    • Vision Transformers (ViT)
    • Video / temporal architectures
    • Multimodal fusion and alignment
    • Embedding and retrieval systems
    • Long-context and memory-efficient architectures
  • Explore and improve:
    • Pretraining objectives
    • Loss functions
    • Training paradigms
    • Generalization and robustness
  • Analyze model behavior through:
    • Rigorous ablation studies
    • Failure case analysis
  • Representation probing and evaluation

3. Efficient Foundation Models & Scalable Deployment

  • Improve the efficiency, scalability, and deployability of large multimodal foundation models for real-world autonomous driving systems
  • Work on areas such as:
    • Model quantization
    • Knowledge distillation
    • Efficient attention mechanisms
    • Sparse architectures and Mixture-of-Experts (MoE)
    • Long-context and memory-efficient modeling
    • Inference acceleration and serving optimization
    • Training and inference system efficiency
  • Optimize model throughput, latency, memory usage, and deployment performance for large-scale production environments
  • MS or PhD in:
      • Computer Vision
      • Machine Learning
      • Robotics
      • Computer Science
      • Related fields
  • Strong understanding of:
      • Foundation models
      • Self-supervised learning
      • Representation learning
      • Multimodal learning
      • Large-scale pretraining
  • Hands-on experience with methods such as:
      • CLIP
      • DINO / DINOv2
      • MAE
      • Contrastive learning
      • Masked modeling
      • MoE or scalable transformer architectures
  • Experience with one or more of the following is highly valued:
      • Video foundation models
      • Long-context modeling
      • Retrieval systems
      • Efficient inference
      • Distributed training
      • Model compression and deployment optimization
  • Strong publication record in top-tier venues is preferred:
      • CVPR
      • ICCV
      • ECCV
      • NeurIPS
      • ICLR
      • ICML

    Salary.com Estimation for Member of Technical Staff (MTS) - Multimodal Foundation Models in Fremont, CA
    $102,642 to $128,337
    If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
    Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

    What is the career path for a Member of Technical Staff (MTS) - Multimodal Foundation Models?

    Sign up to receive alerts about other jobs on the Member of Technical Staff (MTS) - Multimodal Foundation Models career path by checking the boxes next to the positions that interest you.
    Income Estimation: 
    $36,436 - $44,219
    Income Estimation: 
    $50,145 - $86,059
    Income Estimation: 
    $48,515 - $60,705
    Income Estimation: 
    $82,813 - $108,410
    Income Estimation: 
    $120,989 - $162,093
    Income Estimation: 
    $74,806 - $91,633
    Income Estimation: 
    $71,928 - $87,026
    Income Estimation: 
    $145,337 - $174,569
    Employees: Get a Salary Increase
    View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

    Job openings at Deeproute.ai

    • Deeproute.ai Fremont, CA
    • We are building next-generation end-to-end autonomous driving systems powered by reinforcement learning. You will work on applying RL in closed-loop, safet... more
    • 7 Days Ago


    Not the job you're looking for? Here are some other Member of Technical Staff (MTS) - Multimodal Foundation Models jobs in the Fremont, CA area that may be a better fit.

    • Boson AI Santa Clara, CA
    • Boson AI is an early-stage startup building large language tools for everyone to use. Our founders (Alex Smola,Mu Li), and a team of Deep Learning, Optimiz... more
    • 7 Days Ago

    • xai Palo Alto, CA
    • About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small... more
    • 2 Months Ago

    AI Assistant is available now!

    Feel free to start your new journey!