Demo

Machine Learning Evaluation Engineer

Bedrock Robotics
San Francisco, CA Full Time
POSTED ON 12/19/2025
AVAILABLE BEFORE 6/16/2026
The Role

Machine Learning Evaluation Engineer:

Bedrock is bringing autonomy to the construction industry! We’re a group of veterans from the autonomous vehicle industry who are passionate about bringing the benefits of automation to areas in the construction industry currently underserved by the market.

We’re looking for a highly motivated engineer with experience evaluating complex ML systems deployed in the real world. Your Mission: Translate the infinite nuance of the built world into actionable, AI-native evaluations that accelerate Bedrock Operator adoption.

The ideal candidate has hands-on experience in building evaluation systems and designing and executing statistical tests to gauge performance deltas between system iterations. More importantly, you’ve iterated on complex ML systems run in production environments, and you understand the complexities that come with it.

What you’ll do:

Design and maintain eval systems:

  • Build pipelines for measuring system performance – across open loop and closed loop simulation, hardware in the loop systems, and field data from Bedrock Operator equipped machinery. Excite other teams to gain insights earlier in the development cycle through streamlined workflows.

Develop metrics:

  • Connect product goals and system behavior - by bridging real-world specification to measurable indicators from logged data. Empower confident decision making from parameter tuning to program planning by slicing through the noise and delivering objective insights.

Classify data sources for training and testing:

  • Implement infrastructure and classifiers - to self-annotate data and allow creation of datasets for a variety of training and evaluation use cases. Leverage models to source rich annotations for massive datasets to accelerate model iteration.

Predict system performance:

  • Model metrics and interpret results - from various sources ranging from raw sensor data to key leading indicators. Determine whether new construction sites pose hidden challenges and drive business decisions about deployment readiness.

What we’re looking for:

  • Engineers who are currently Senior or Staff level with 5 years of professional software engineering, data science, or research experience
  • 2 years of professional experience analyzing modern ML or robotics system performance on real-world problems
  • Proficiency in Python and a data warehouse query language and comfort with development on infrastructure within parallelized cloud-based frameworks
  • Strong statistical analysis skills (e.g. classification, model fit bias determination, hypothesis testing, and uncertainty quantification)
  • Experience working with large datasets
  • Bonus points: We’re especially interested in engineers who have applied statistical backgrounds to ML research or real-world robotics applications.

Our roles are often flexible. If you don't fit all the criteria, or are in another location (especially one where we have an office like SF or NY) please apply anyway! We'd love to consider you.

Our roles are often flexible. If you don't fit all the criteria, or are in another location (especially one where we have an office like SF of NY) please apply anyway! We'd love to consider you.

Join the team bringing advanced autonomy to the built world

At Bedrock, we've assembled one of the most experienced autonomous technology teams in the industry, with deep expertise scaling breakthroughs across transportation, infrastructure, and enterprise software. Our leaders helped put the first self-driving cars on public roads at Waymo, scaled systems for Segment's $3.2B acquisition, and grew Uber Freight to $5B in revenue.

While others debate the future of AI, we're deploying it in the real world. Our systems are already installed on heavy machines across the country, learning on real construction sites and working to reshape the earth with survey-grade precision and exceptional safety. This isn't a simulation—it's autonomous intelligence working on billion-dollar infrastructure projects.

In just over a year, we've raised $80M, put our equipment into the field, and established partnerships with forward-thinking contractors who are integrating our technology into their operations. We're working quickly to close the gap between America's surging demand for housing, data centers, manufacturing hubs, and the construction industry's growing labor shortage.

Here, algorithms meet steel-toed boots. You'll collaborate with both construction veterans and experienced engineers, tackling problems where your work directly impacts how the physical world get built. If you're interested in applying cutting-edge technology to solve meaningful problems alongside a talented team—we'd love to have you join us.

Salary : $80

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Machine Learning Evaluation Engineer?

Sign up to receive alerts about other jobs on the Machine Learning Evaluation Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$97,257 - $120,701
Income Estimation: 
$123,167 - $152,295
Income Estimation: 
$90,032 - $105,965
Income Estimation: 
$111,859 - $131,446
Income Estimation: 
$110,457 - $133,106
Income Estimation: 
$105,809 - $128,724
Income Estimation: 
$122,763 - $145,698
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Bedrock Robotics

  • Bedrock Robotics San Francisco, CA
  • The Role We are hiring exceptional software engineers to develop our simulation environment. Simulation is the life-blood of robotics. You will help build ... more
  • 13 Days Ago

  • Bedrock Robotics Lockhart, TX
  • The Role About the Role: We are seeking a highly motivated and technically versatile Operations Test Engineer to lead and support field test deployment, op... more
  • 5 Days Ago


Not the job you're looking for? Here are some other Machine Learning Evaluation Engineer jobs in the San Francisco, CA area that may be a better fit.

  • Calico Life Sciences South San Francisco, CA
  • Who We Are Calico (Calico Life Sciences LLC) is an Alphabet-founded research and development company whose mission is to harness advanced technologies and ... more
  • 7 Days Ago

  • Adobe Systems San Francisco, CA
  • Job Details Our Company Changing the world through digital experiences is what Adobe's all about. We give everyone-from emerging artists to global brands-e... more
  • 1 Month Ago

AI Assistant is available now!

Feel free to start your new journey!