Demo

Annotation Data Scientist, Evaluation Integrity (Siri)

Apple, Inc.
Cambridge, MA Full Time
POSTED ON 6/12/2026
AVAILABLE BEFORE 7/12/2026
Play a part in the ongoing revolution in human-computer interaction. Siri is evolving - and the way we evaluate it has to evolve with it. Join the Evaluation Integrity team to help build the trusted quality signal behind every Siri release.\\nWithin the Siri evaluation organization, the Human Evaluation sub-team is responsible for answering the question: can we trust our evals? We do that by designing human-in-the-loop (HITL) annotation tasks that scrutinize every moving part of an agentic evaluation - the simulated user agent, the conversation it has with Siri, and the automated evaluators that grade the exchange. This role sits at the intersection of data science, human annotation engineering, and evaluation methodology, and is instrumental in turning human judgment into a rigorous, reproducible signal that directly informs pre-ship model and product decisions.\\n

As an Annotation Data Scientist on the Evaluation Integrity team, you will design and run HITL annotation projects that evaluate the quality and authenticity of agentic user personae, the validity of agent-to-agent conversations, and the reliability of LLM-as-judge and rule-based evaluators against Siri's product specifications. You will own annotation initiatives end-to-end; from rubric design and tooling, through annotator calibration, to data science analysis that turns annotator judgments into actionable signal for modeling, planning, and product teams.\n

Bachelor's or Master's degree in a quantitative or related field such as Data Science, Computer Science, Linguistics, Statistics, or Cognitive Science, or equivalent job-related experience.\n3 years of hands-on experience working with human-annotated datasets or human-in-the-loop evaluation methodologies for machine learning, natural language processing, or large language model systems.\n3 years of experience using Python for data processing, analysis, and prototyping, including experience with libraries such as pandas, Jupyter, and at least one data visualization library.\nExperience designing, implementing, and communicating annotation schemas, rubrics, or ontologies for machine learning training or evaluation data.\nExperience managing multiple concurrent dataset curation efforts, including scoping work, iterating on guidelines, coordinating with in-house or vendor annotators, and monitoring annotator performance metrics such as accuracy, throughput, and inter-annotator agreement.\nExperience specifying or designing custom annotation tooling in collaboration with software engineers.

Experience evaluating LLM-powered or agentic systems, including familiarity with LLM-as-judge methodologies, rubric-based grading, or trajectory and tool-call evaluation.\nFamiliarity with statistical methods that address accuracy and variability in human annotation data, such as inter-annotator agreement, Cohen's or Fleiss' kappa, Krippendorff's alpha, or bootstrapping.\nData-querying experience with SQL, Spark, or similar, and comfort working with large, complex, real-world datasets.\nExperience building pre-ship evaluation pipelines for conversational or assistant products.\nExperience with prompt engineering, or with designing simulated user personae for agent evaluation.\nExperience running annotation programs across multiple locales or at large scale.\nExcellent written and verbal communication skills, with the ability to explain technical topics clearly to data scientists, engineers, annotators, and cross-functional partners.\nProven ability to collaborate effectively across functions and drive projects of varying sizes and scopes - knowing when to dive deep and when to delegate.

Salary.com Estimation for Annotation Data Scientist, Evaluation Integrity (Siri) in Cambridge, MA
$107,875 to $133,632
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Annotation Data Scientist, Evaluation Integrity (Siri)?

Sign up to receive alerts about other jobs on the Annotation Data Scientist, Evaluation Integrity (Siri) career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$90,112 - $113,166
Income Estimation: 
$116,765 - $144,626
Income Estimation: 
$90,112 - $113,166
Income Estimation: 
$116,765 - $144,626
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Apple, Inc.

  • Apple, Inc. Boulder, CO
  • Do you have a passion for quality and want to lead a team that shapes cutting edge technologies at Apple? Quality is the cornerstones of what makes our pro... more
  • 4 Days Ago

  • Apple, Inc. Washington, WA
  • Security is at the heart of Apple's products and services. The Security Enablement Team at Apple focuses on building Software and Services that provide fou... more
  • 4 Days Ago

  • Apple, Inc. Seattle, WA
  • Come help us build the next generation cloud platform to support internal and public-facing services across Apple. In Apple Services Engineering (ASE), we ... more
  • 4 Days Ago

  • Apple, Inc. Washington, WA
  • As a Machine Learning Engineer in the Machine Intelligence Neural Design (MIND) team, you will have an opportunity to be part of an ML innovation organizat... more
  • 4 Days Ago


Not the job you're looking for? Here are some other Annotation Data Scientist, Evaluation Integrity (Siri) jobs in the Cambridge, MA area that may be a better fit.

  • Boston Dynamics Waltham, MA
  • Boston Dynamics is a world leader in mobile robots, tackling some of the toughest robotics challenges. We combine the principles of dynamic control and bal... more
  • 7 Days Ago

  • Cohere Boston, MA
  • Who are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are bui... more
  • 19 Days Ago

AI Assistant is available now!

Feel free to start your new journey!