What are the responsibilities and job description for the Machine Learning Engineer, Applied AI Infrastructure position at Orbifold AI?

About Orbifold AI

Orbifold AI is building the foundational infrastructure that the next generation of physical AI runs on. We work directly with leading robotics and world model research teams. Our work spans evaluation, model training, reinforcement learning, and the multimodal data systems that fuel them — one integrated research loop.

The bottleneck for physical AI is no longer model scale or computation. It is whether evaluation, training, and data can close the loop tightly enough to drive real progress. That loop is itself the infrastructure the next generation of physical AI will stand on, and it is what we are building.

Role Overview

We are hiring a Machine Learning Engineer to scale and optimize the ML infrastructure behind our pipelines. We process massive volumes of multimodal data — video, image, sensor, action — for some of the most demanding physical AI and world model teams in the field. Our foundation is built on PyTorch and Ray.

You will own the systems that turn raw multimodal data into the training, evaluation, and RL signals our partners depend on. Your work is the bridge between our research and our distributed compute infrastructure: making the pipelines performant, fault-tolerant, and ready to scale to the next order of magnitude.

This is highly applied infrastructure work with direct impact on what our partner models can do in the real world.

What You Will Work On

Architect, build, and optimize distributed ML pipelines on Ray (Ray Core, Ray Train, Ray Serve) and PyTorch, designed for the demands of multimodal video, image, and sensor data at scale
Profile and tune distributed training jobs and inference deployments to maximize GPU/CPU utilization and reduce latency
Build robust abstractions and internal tools that let our researchers and product engineers deploy PyTorch models onto our Ray clusters seamlessly
Design and maintain high-throughput video processing pipelines (e.g. FFmpeg, NVDEC/NVENC, frame-level indexing) that feed our curation, training, and evaluation workloads
Ensure the high availability, fault tolerance, and observability of our distributed compute systems
Build the serving infrastructure for our evaluation harnesses, verification models, and RL environments
Collaborate with research, data, and product engineering teams to translate modeling constraints into scalable infrastructure solutions

What We Are Looking For

3 years of software engineering experience with a strong focus on backend, distributed systems, or ML infrastructure
Strong proficiency in Python and production-grade code
Deep practical knowledge of PyTorch — including model serving, data loading bottlenecks, and memory management
Hands-on experience with Ray for scaling Python and machine learning applications
Solid understanding of distributed systems concepts: networking, concurrency, fault tolerance, parallel processing
Comfortable owning systems end to end in fast-paced applied research or startup environments

Nice to Have

Experience with large-scale video or multimodal data pipelines (e.g. FFmpeg, NVDEC/NVENC, 3D / point cloud handling)
Cloud-native infrastructure experience (Kubernetes, Docker) and major cloud providers (AWS, GCP, Azure)
Hardware accelerator experience (GPUs, TPUs) and low-level optimization (CUDA, C )
Background in MLOps and automated CI/CD pipelines for machine learning
Familiarity with VLA models, world models, or robotics middleware (e.g. ROS/ROS2)
Experience with reinforcement learning environments or simulation infrastructure

Why This Role

Build the infrastructure that the next generation of physical AI will stand on
Work directly with the labs and companies shipping frontier robotics and world model systems
Own a critical layer of the stack end to end — from raw video and sensor ingest to distributed training and real-time evaluation serving
High ownership, fast iteration, and direct impact on deployed systems

How to Apply

Please send your resume and any relevant work (papers, projects, repos) to: careers@orbifold.ai

Apply for this job

Receive alerts for other Machine Learning Engineer, Applied AI Infrastructure job openings

Machine Learning Engineer, Applied AI Infrastructure

What are the responsibilities and job description for the Machine Learning Engineer, Applied AI Infrastructure position at Orbifold AI?

What is the career path for a Machine Learning Engineer, Applied AI Infrastructure?

Not the job you're looking for? Here are some other Machine Learning Engineer, Applied AI Infrastructure jobs in the Palo Alto, CA area that may be a better fit.

We don't have any other Machine Learning Engineer, Applied AI Infrastructure jobs in the Palo Alto, CA area right now.

AI Assistant is available now!