Demo

Evaluation Lead

Archetype AI
Palo Alto, CA Full Time
POSTED ON 4/5/2026
AVAILABLE BEFORE 7/12/2026
About Archetype AI

Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team from Google, Archetype AI is building a foundation model for the physical world, a real-time multimodal LLM for real life, transforming real-world data into valuable insights and knowledge that people will be able to interact with naturally. It will help people in their real lives, not just online, because it understands the real-time physical environment and everything that happens in it.

Supported by deep tech venture funds in Silicon Valley, Archetype AI is currently pre-Series A, progressing rapidly to develop technology for their next stage. This presents a unique and once-in-a-lifetime opportunity to be part of an exciting AI team at the beginning of their journey, located in the heart of Silicon Valley.

Our team is headquartered in Palo Alto, California, with team members throughout the US and Europe.

We are actively growing, so if you are an exceptional candidate excited to work on the cutting edge of physical AI and don’t see a role that exactly fits you below you can contact us directly with your resume via jobsarchetypeaiio.

About The Role

Archetype AI is seeking a hands-on Evaluation Lead to build and assess model performance for physical AI. You will design and implement advanced evaluation techniques for assessing the strengths and weaknesses of real-world AI models, and build and scale evaluation frameworks to rapidly test and generate reports on model performance. Responsibilities include partnering closely with research and engineering teams to develop evaluation methodologies, analytically assessing and improving test datasets, uncovering model weaknesses or risks, and tracking competitive industry benchmarks. This is a high-impact role for someone who thrives in a fast-paced AI environment and wants to directly influence our path as we scale our AI technologies and business.

Core Responsibilities

  • Drive Benchmarking & Evaluation
    • Design and implement rigorous evaluation methodologies and benchmarks for measuring model effectiveness, reliability, alignment, and safety
    • Lead evaluation of model performance, ranging from offline experiments to full production model testing
  • Build & Scale Evaluation Frameworks
    • Design and oversee the pipelines, dashboards, and tools that automate model evaluation
    • Design and oversee tools for A/B model testing, regression testing, and production model performance
  • Lead Evaluation Strategy
    • Develop and implement strategies for evaluating physical AI models that can scale across a broad range of real-world use cases, sensor types, and edge cases
    • Plan, run, and oversee evaluations, across internal teams and external customers
    • Drive edge case discovery, red-teaming, safety, privacy, and risk evaluation - feeding back knowledge to key stakeholders in research and engineering teams
Key Requirements

  • Extensive expertise in evaluating AI and machine learning models, ideally in physical AI or a related AI field
  • Experience in designing, implementing, and refining evaluation metrics
  • Deep understanding of machine learning, AI, and generative models
  • Excellent python and software engineering skills
  • Experience designing and building scaleable data pipelines and evaluation tools
  • Experience collaborating closely with key stakeholders from research, engineering, and product teams
  • Strong communication and documentation skills, with a bias for creating detailed evaluation reports that help drive model performance
  • Startup-ready mindset with the ability to thrive in high-velocity, high-ambiguity environments

Minimum Qualifications

  • Extensive expertise in evaluating AI and machine learning models, ideally in physical AI or a related AI field
  • Experience in designing, implementing, and refining evaluation metrics
  • Deep understanding of machine learning, AI, and generative models
  • Excellent python and software engineering skills
  • Experience designing and building scaleable data pipelines and evaluation tools
  • Experience collaborating closely with key stakeholders from research, engineering, and product teams
  • Strong communication and documentation skills, with a bias for creating detailed evaluation reports that help drive model performance
  • Startup-ready mindset with the ability to thrive in high-velocity, high-ambiguity environments

What We Would Love To See

  • Experience evaluating real-world, real-time algorithms
  • Experience evaluating a broad range of sensor types, such as cameras, LIDAR, physical sensors, RF sensors, and beyond
  • A strong scientific approach to evaluation and understanding model performance
  • Experience in evaluating production algorithms
  • Experience building and curating data campaigns to create extensive test datasets
  • Experience managing internal teams and/or external vendors

Salary.com Estimation for Evaluation Lead in Palo Alto, CA
$157,788 to $192,745
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Archetype AI

  • Archetype AI San Mateo, CA
  • About Archetype AI Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team f... more
  • 8 Days Ago

  • Archetype AI San Mateo, CA
  • About Archetype AI Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team f... more
  • 12 Days Ago

  • Archetype AI Palo Alto, CA
  • About Archetype AI Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team f... more
  • 13 Days Ago

  • Archetype AI San Mateo, CA
  • About Archetype AI Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team f... more
  • 13 Days Ago


Not the job you're looking for? Here are some other Evaluation Lead jobs in the Palo Alto, CA area that may be a better fit.

  • Lead Sunnyvale, CA
  • Lead is a fintech building banking infrastructure for embedded financial products and services. We operate an FDIC-insured bank headquartered in Kansas Cit... more
  • 10 Days Ago

  • Lead Sunnyvale, CA
  • Lead is a fintech building banking infrastructure for embedded financial products and services. We operate an FDIC-insured bank headquartered in Kansas Cit... more
  • 13 Days Ago

AI Assistant is available now!

Feel free to start your new journey!