Demo

Member of Technical Staff, Evals

Magic
San Francisco, CA Full Time
POSTED ON 6/2/2026
AVAILABLE BEFORE 7/1/2026
Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than humans can alone. Our approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and inference-time compute to achieve this goal.

About The Role

Evals builds the internal platform that teams across Magic use to evaluate the performance of first-party and third-party models. The team supports pre-training, post-training, data, inference, and product, and sits on the critical path of many of the company's most important decisions.

As a Member of Technical Staff on Evals, you will build both the platform and the evaluations themselves. You'll develop infrastructure for large-scale evaluations, data ablations, and dataset quality analysis, while designing and validating the methodologies used to measure model performance.

Sweating the details matters on this team. Many benchmarks, papers, and open-source evaluation frameworks contain subtle bugs or flawed assumptions that lead to misleading conclusions. We care deeply about correctness, reproducibility, and measurement quality.

Evals are essential to the success of the company. By building trustworthy evaluation systems, you will help Magic make better research decisions, build better datasets, and ship better products.

What You'll Work On

  • Build and maintain the internal evals platform used across Magic
  • Design, implement, and validate eval tasks for pre-training, post-training, reinforcement learning, inference, and product systems
  • Develop infrastructure for running large-scale evaluations
  • Build systems to measure dataset quality and identify opportunities to improve training data
  • Improve evaluation correctness, reproducibility, and reliability
  • Audit and improve upon public benchmarks, evaluation methodologies, and open-source implementations
  • Partner with research, data, inference, and product teams to define metrics that accurately reflect model quality
  • Build tooling and frameworks that enable teams across Magic to make decisions based on trustworthy measurements

What We're Looking For

  • Strong software engineering fundamentals
  • Experience building production systems, internal platforms, or developer infrastructure
  • Exceptional attention to detail and a high bar for correctness
  • Experience working with machine learning systems, evaluation frameworks, data infrastructure, or research tooling
  • Ability to reason critically about benchmarks, metrics, and experimental methodology
  • Strong intuition for measurement quality and experimental design
  • Experience designing, implementing, or operating systems that run at scale
  • Strong debugging and investigative skills
  • Comfortable navigating ambiguity and determining whether a measurement is actually capturing the behavior it claims to measure
  • Skepticism toward results that cannot be reproduced, validated, or explained
  • Track record of owning technical projects end-to-end
  • Excitement about helping researchers and engineers make better decisions through trustworthy measurements

Compensation, Benefits, And Perks (US)

  • Annual salary range between $200K - $550K depending on experience
  • Equity is a significant part of total compensation, in addition to salary
  • 401(k) plan with 6% salary matching
  • Generous health, dental, and vision insurance for you and your dependents
  • Unlimited paid time off
  • Visa sponsorship and relocation support for candidates moving to San Francisco
  • A small, fast-moving, highly collaborative team working on frontier AI systems

Magic strives to be the place where high-potential individuals can do their best work. We value quick learning and grit just as much as skill and experience.

Our culture

  • Integrity. Words and actions should be aligned
  • Hands-on. At Magic, everyone is building
  • Teamwork. We move as one team, not N individuals
  • Focus. Safely deploy AGI. Everything else is noise
  • Quality. Magic should feel like magic

Compensation Range: $200K - $550K

Salary : $200,000 - $550,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Member of Technical Staff, Evals?

Sign up to receive alerts about other jobs on the Member of Technical Staff, Evals career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$36,436 - $44,219
Income Estimation: 
$50,145 - $86,059
Income Estimation: 
$48,515 - $60,705
Income Estimation: 
$113,077 - $147,784
Income Estimation: 
$135,356 - $164,911
Income Estimation: 
$153,902 - $198,246
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Magic

  • Magic San Francisco, CA
  • Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to saf... more
  • 4 Days Ago

  • Magic San Francisco, CA
  • Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to saf... more
  • 4 Days Ago

  • Magic San Francisco, CA
  • Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to saf... more
  • 7 Days Ago

  • Magic San Francisco, CA
  • Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to saf... more
  • 9 Days Ago


Not the job you're looking for? Here are some other Member of Technical Staff, Evals jobs in the San Francisco, CA area that may be a better fit.

  • Fireworks AI San Mateo, CA
  • About Us At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and mos... more
  • 5 Days Ago

  • Perplexity San Francisco, CA
  • Perplexity serves tens of millions of users daily with reliable, high-quality answers grounded in an LLM-first search engine and our specialized data sourc... more
  • 23 Days Ago

AI Assistant is available now!

Feel free to start your new journey!