What are the responsibilities and job description for the Senior / Staff AI Research Engineer, Data Infrastructure position at RoboForce?
Why RoboForce
RoboForce is an AI robotics company developing Physical AI–powered Robo-Labor for dull, dirty, and dangerous work. The company's robots are engineered for demanding industrial environments, with a focus on real-world deployment and scalability.
We are looking for a Senior / Staff AI Research Engineer, Data Infrastructure to build the data and learning engine behind RoboForce's Physical AI stack. In this role, you will own the full pipeline — from raw teleoperation and UMI device data collection through curation, annotation, and storage, to post-training infrastructure that scores demonstrations, identifies failure patterns, and closes the loop back into model retraining.
Responsibilities
- Design and maintain end-to-end data collection pipelines ingesting multimodal demonstration data from teleoperation devices and UMI hardware, including synchronization, versioning, and distributed storage at scale.
- Build annotation tooling and data curation workflows — quality filtering, deduplication, episode scoring, and domain reweighting — to produce high-quality training datasets for robot policy learning.
- Develop post-SFT reinforcement learning infrastructure: implement reward scoring on demonstrations, mine and categorize failure patterns, and feed curated failure data back into the retraining loop.
- Build evaluation and test infrastructure to log policy rollouts on-robot, capture structured results, and surface actionable diagnostics for the research team.
- Collaborate with ML researchers to define data schemas, episode formats, and pipeline interfaces that support rapid iteration on VLA and manipulation policy training.
- Architect scalable storage and retrieval systems for heterogeneous robot data (vision, proprioception, action, language) across both cloud and on-prem environments.
Requirements
- Bachelor's or Master's degree in Computer Science, Robotics, or related field with 5 years of experience.
- Strong proficiency in Python and experience building production-grade data pipelines and ETL systems.
- Hands-on experience with large-scale dataset management, including versioning, deduplication, quality filtering, and distributed storage (e.g., S3, GCS, HDF5, WebDataset, Zarr).
- Experience building or working with post-training infrastructure — SFT pipelines, reward modeling, or RL training loops (e.g., PPO, DPO, rejection sampling).
- Familiarity with deep learning frameworks (PyTorch, JAX) and ML training workflows sufficient to collaborate tightly with research teams.
- Requires 5 days/week in-office collaboration with the teams.
Bonus Qualifications
- Experience with robotics data collection hardware — teleoperation devices, UMI, GELLO, or similar — and the synchronization and preprocessing challenges they introduce.
- Familiarity with robot learning pipelines: imitation learning, behavior cloning, or VLA/VLM fine-tuning workflows.
- Experience building evaluation or experiment tracking infrastructure (e.g., Weights & Biases, MLflow, custom rollout loggers).
- Proven ability to design annotation tooling or human-in-the-loop labeling systems for structured or multimodal data.
Benefits
- Competitive stock options/equity programs.
- Health, dental, and vision insurance, 401(k) plan.
- Visa sponsorship and green card support for qualified candidates.
- Lunches and dinners, a fully stocked kitchen, and regular team-building events.