Demo

Site Reliability Engineer | AI Supercomputing

Luma AI
Palo Alto, CA Full Time
POSTED ON 12/28/2025
AVAILABLE BEFORE 2/3/2026
The Opportunity

Luma AI is building the engine for multimodal general intelligence. To teach models to understand the world through video, audio, and images, we operate at the absolute frontier of computing power. We have secured the capital to deploy massive-scale GPU clusters that rival the world's largest supercomputers, while maintaining the agility of a focused engineering lab. This role places you at the intersection of hardware and software, where you architect the physical and digital foundation of AGI.

Where You Come In

You will serve as a technical authority on the systems that power our research and product velocity. This is a role for a builder who prefers bare metal to managed services and understands that at our scale, standard cloud abstractions break down. You will architect, optimize, and maintain the massive, multi-vendor GPU supercomputers required to train our foundational models.

What You Will Build

  • Supercomputing Architecture: Design and deploy high-performance clusters combining thousands of GPUs, CPUs, and high-throughput networking to maximize training efficiency.
  • The Network Layer: Optimize low-level networking (InfiniBand, RDMA) to ensure seamless communication between accelerators, eliminating bottlenecks in distributed training jobs.
  • Hardware-Software Synthesis: Collaborate with hardware partners to push the boundaries of what is possible, debugging failures at the intersection of the kernel, driver, and silicon.

The Profile We Are Looking For

  • HPC Authority: You possess elite knowledge of high-performance computing (HPC), including job schedulers and the nuances of GPU architecture.
  • Deep Systems Fluency: You are comfortable navigating the Linux terminal to solve complex performance issues, utilizing tools like perf and strace to optimize at the OS level.
  • First-Principles Engineering: You have a history of building infrastructure from the ground up, demonstrating the ability to design systems where no playbook currently exists.

Salary : $170,000 - $360,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Site Reliability Engineer | AI Supercomputing?

Sign up to receive alerts about other jobs on the Site Reliability Engineer | AI Supercomputing career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$81,253 - $112,554
Income Estimation: 
$89,966 - $112,616
Income Estimation: 
$95,407 - $122,738
Income Estimation: 
$103,114 - $138,258
Income Estimation: 
$86,891 - $130,303
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Luma AI

  • Luma AI Palo Alto, CA
  • The Opportunity Luma AI is a full-stack AI lab building Multimodal AGI. To truly understand the world, models must learn from audio, video, and images. We ... more
  • 14 Days Ago

  • Luma AI Palo Alto, CA
  • The Opportunity We believe that the next step function change in intelligence will come from vision. We have access to capital and compute resources necess... more
  • 14 Days Ago

  • Luma AI Palo Alto, CA
  • The Opportunity Luma AI operates at the intersection of research and product, backed by over $1.3 billion in funding. We are building the next era of AI wi... more
  • 14 Days Ago

  • Luma AI Palo Alto, CA
  • The Opportunity At Luma AI, we believe multimodality is critical for intelligence. We are a full-stack lab, training foundational models and building the p... more
  • 14 Days Ago


Not the job you're looking for? Here are some other Site Reliability Engineer | AI Supercomputing jobs in the Palo Alto, CA area that may be a better fit.

  • Archetype AI Palo Alto, CA
  • About Archetype AI Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team f... more
  • 2 Days Ago

  • Boson AI Santa Clara, CA
  • About The RoleWe're looking for a Senior Site Reliability Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packe... more
  • 2 Months Ago

AI Assistant is available now!

Feel free to start your new journey!