Demo

Senior ML Infrastructure Engineer (PyTorch, Kubernetes, GPU Training)

Finoit Inc.
Redwood, CA Full Time
POSTED ON 6/21/2026
AVAILABLE BEFORE 7/20/2026

Senior ML Infrastructure Engineer (PyTorch, Kubernetes, GPU Training)

Short Job Description

We are seeking a Senior ML Infrastructure Engineer to design and scale the infrastructure powering large-scale machine learning training workloads. In this role, you'll build high-performance GPU training platforms, optimize distributed training pipelines, and improve the developer experience for ML researchers.

Responsibilities:

  • Design and scale distributed ML training infrastructure for large GPU clusters.
  • Build and optimize training pipelines using PyTorch, DeepSpeed, and distributed training frameworks.
  • Develop and maintain job scheduling systems using Kubernetes and/or SLURM.
  • Create high-throughput data pipelines for large-scale multimodal datasets.
  • Optimize GPU utilization, memory efficiency, and overall system performance.
  • Build low-latency inference pipelines for production ML deployments.

Required Skills:

  • 7 years of experience in ML Infrastructure, HPC, or Distributed Systems.
  • Strong experience with PyTorch, DeepSpeed, FSDP, ZeRO, or similar distributed training frameworks.
  • Hands-on experience with Kubernetes, cloud platforms (AWS/Google Cloud Platform), and containerized environments.
  • Strong understanding of distributed systems, GPU optimization, NCCL, memory management, and performance tuning.
  • Experience building scalable ML infrastructure from development through production.

Location: Redwood City, CA (On-site)
Employment Type: Full-Time

Nice to Have:

  • Experience with multimodal AI, robotics data pipelines, Triton, TensorRT, custom ML kernels, or ML compiler/runtime optimization.

Salary : $250,000 - $320,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Finoit Inc.

  • Finoit Inc. Chicago, IL
  • Job Title: Senior Content Marketing Manager – Asset Management / Investments We are seeking a Senior Content Marketing Manager with 5 years of experience c... more
  • 4 Days Ago

  • Finoit Inc. Redwood, CA
  • Job Title: Senior Software Engineer – Data Infrastructure (Python | AWS/Google Cloud Platform | Kubernetes) Location: Redwood City, CA (On-site) We are see... more
  • 4 Days Ago

  • Finoit Inc. Chicago, IL
  • MSP Project Manager (On-site) We’re looking for a results-driven IT Project Manager to lead and deliver technical projects from kickoff to completion. This... more
  • 5 Days Ago

  • Finoit Inc. Boston, MA
  • Job Title: Senior Content Marketing Manager – Asset Management / Investments We are seeking a Senior Content Marketing Manager with 5 years of experience c... more
  • 6 Days Ago


Not the job you're looking for? Here are some other Senior ML Infrastructure Engineer (PyTorch, Kubernetes, GPU Training) jobs in the Redwood, CA area that may be a better fit.

  • Applied Intuition Sunnyvale, CA
  • About Applied Intuition Applied Intuition, Inc. is powering the future of physical AI. Founded in 2017 and now valued at $15 billion, the Silicon Valley co... more
  • 1 Month Ago

  • Apple, Inc. Cupertino, CA
  • The Intelligence Platform team empowers clients across Apple's operating systems with high quality user-centric knowledge and inferences that enable next g... more
  • 1 Day Ago

AI Assistant is available now!

Feel free to start your new journey!