Demo

Software Engineer (Model Evaluation & Benchmarking)

SPREEAI
San Francisco, CA Full Time
POSTED ON 4/21/2026
AVAILABLE BEFORE 5/17/2026

About The Role

We are hiring Engineers focused on AI Model Evaluation to build the systems that ensure multimodal AI behaves reliably, consistently, and predictably as it moves from research into production. This position focuses on evaluating generative and vision-based models through automated benchmarking, dataset-driven testing, and performance validation pipelines.

You will work at the intersection of applied science, infrastructure, and product β€” helping define how we measure realism, consistency, and quality across image, video, and multimodal AI systems.


Why This Role Exists

Modern AI evaluation extends beyond pass/fail testing. Multimodal generative systems require:

  • benchmarking across visual realism, pose consistency, and identity preservation,
  • automated regression detection across model checkpoints,
  • scalable evaluation pipelines integrated into continuous deployment workflows.


We are building evaluation systems where research velocity and product reliability must coexist. This role is for engineers interested in defining how quality is measured in generative AI systems.


What You'll Do

  • Build automated evaluation pipelines for multimodal AI models.
  • Benchmark diffusion models, vision systems, and generative workflows.
  • Validate model checkpoints and detect regressions across versions.
  • Develop evaluation metrics for realism, consistency, and performance.
  • Integrate evaluation tooling into CI/CD workflows.
  • Collaborate with ML researchers and infrastructure teams to ensure production readiness.
  • Analyze failure modes and propose evaluation strategies.


Core Areas & Tooling

Candidates Should Be Familiar With Or Interested In

  • LLM, VLM, or Stable Diffusion model evals
  • Image/Video benchmarking techniques
  • Multimodal evaluation frameworks
  • dataset-driven testing workflows
  • research experiment validation pipelines


Qualifications

  • Degree in Computer Science, AI, Engineering, or comparable combination of education and practical experience.
  • Strong programming skills in Python.
  • Familiarity with object-oriented programming (C , Java, Python, or similar).
  • Strong data structures and algorithms fundamentals.
  • Understanding of machine learning experimentation workflows.


Preferred Qualifications

  • Experience evaluating vision or generative models.
  • Familiarity with HuggingFace ecosystem or open-source ML toolkits.
  • Experience building automated test frameworks or benchmarking tools.
  • Knowledge of diffusion models or multimodal architectures.
  • Experience with data analysis tools (NumPy, Pandas, visualization libraries).


SPREEAI is a fast-growing, innovative AI company at the forefront of fashion and e-commerce, revolutionizing how consumers engage with fashion through lifelike photorealistic try-on technology and hyper-personalized shopping experiences. Our mission is to redefine the retail landscape with cutting-edge AI solutions that blend high fashion and technology. We thrive in a dynamic, fast-paced environment where creativity meets technology to drive real impact. If you are passionate about innovation and shaping the future of fashion, SPREEAI offers a platform to make your mark.

Salary.com Estimation for Software Engineer (Model Evaluation & Benchmarking) in San Francisco, CA
$117,657 to $148,170
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Software Engineer (Model Evaluation & Benchmarking)?

Sign up to receive alerts about other jobs on the Software Engineer (Model Evaluation & Benchmarking) career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$77,657 - $95,021
Income Estimation: 
$97,257 - $120,701
Income Estimation: 
$101,387 - $124,118
Income Estimation: 
$119,030 - $151,900
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at SPREEAI

  • SPREEAI San Francisco, CA
  • About The Role We are hiring Software Engineers focused on AI Infrastructure to build the systems that enable frontier multimodal AI to operate reliably at... more
  • 2 Days Ago

  • SPREEAI San Francisco, CA
  • About The Role We are hiring ML Researchers to develop novel approaches that advance the frontier of multimodal vision AI and create product-defining capab... more
  • 8 Days Ago

  • SPREEAI York, NY
  • About The Role Ready to launch your social media career at the intersection of fashion and AI? SPREEAI – a fast-growing, innovative startup blending high f... more
  • 9 Days Ago

  • SPREEAI San Francisco, CA
  • About The Role We are hiring Software Engineers focused on AI Infrastructure to build the systems that enable frontier multimodal AI to operate reliably at... more
  • 9 Days Ago


Not the job you're looking for? Here are some other Software Engineer (Model Evaluation & Benchmarking) jobs in the San Francisco, CA area that may be a better fit.

  • Beacon Software San Francisco, CA
  • Beacon Software is a permanent capital holding company which acquires and grows essential businesses. We are a profitable series B firm that combines great... more
  • 10 Days Ago

  • Advent Software, Inc. San Francisco, CA
  • As a leading financial services and healthcare technology company based on revenue, SS&C is headquartered in Windsor, Connecticut, and has 27,000 employees... more
  • 1 Month Ago

AI Assistant is available now!

Feel free to start your new journey!