Demo

Senior Research Engineer, LLM Evaluation and Behavioral Analysis

togetherai
San Francisco, CA Full Time
POSTED ON 2/12/2026 CLOSED ON 4/12/2026

What are the responsibilities and job description for the Senior Research Engineer, LLM Evaluation and Behavioral Analysis position at togetherai?

About the Role

Together AI is building the fastest, most capable open-source-aligned LLMs and inference stack in the world. As part of the Turbo organization, you will be a critical bridge between cutting-edge model research and real-world behavioral reliability. This role focuses on deeply understanding model behavior — probing reasoning, tool use, function calling, multi-step interactions, and subtle failure modes — and building the evaluation systems that ensure models behave intelligently and consistently in production.

You will develop robust evaluation pipelines, design high-quality behavioral test suites, and work closely with training, post-training, inference, and product teams to identify regressions, shape datasets, and influence model improvements. Your work will directly define how Together measures model quality and reliability across releases.

Responsibilities

  • Build and iterate on evaluation frameworks that measure model performance across instruction following, function calling, long-context reasoning, multi-turn dialog, safety, and agentic behaviors.
  • Develop specialized evaluation suites for:
    • Function calling — argument correctness, schema adherence, tool selection, multi-function planning, and error recovery.
    • Agentic workflows — task decomposition, multi-step planning, self-correction, and autonomous tool-use sequences.
    • Tool-augmented interactions — search, retrieval, code execution, API-driven actions.
  • Create CI/CD automated pipelines for A/B comparisons, regression detection, behavioral drift monitoring, and adversarial probing.
  • Design and curate high-quality evaluation datasets, especially nuanced or challenging cases across domains.
  • Collaborate with researchers and engineers to diagnose failures, triage regressions, and guide data selection, shaping strategies, objective design, and system improvements.
  • Work with engineering teams to build dashboards, reports, and internal tools that help visualize behavior changes across releases.
  • Operate in a fast-paced, high-impact environment with deep technical ownership and close partnership with world-class model researchers and infra engineers.

Requirements

  • Strong engineering skills with Python, evaluation tooling, and distributed workflows.
  • Experience working with LLMs or transformer-based models, particularly in model evaluation, testing, or red-teaming.
  • Ability to reason clearly about qualitative behavior, edge cases, and model failure patterns.
  • Experience designing experiments, building datasets, and interpreting noisy behavioral signals.
  • Understanding of function calling and structured output formats.
  • Familiarity with GPU or distributed compute environments.
  • Hands-on experience evaluating function-calling models, agentic systems, or tool-augmented LLM pipelines.
  • Experience with multi-turn or multi-step reasoning tasks.
  • Familiarity with inference systems, distributed infrastructure, or post-training workflows.
  • Passion for discovering subtle behaviors, surprising model gaps, or edge-case failures.

About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society. Our mission is to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets including FlashAttention, Hyena, FlexGen, ATLAS, and RedPajama. We invite you to join a passionate group of researchers and engineers in building the next generation of AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance, and other benefits. The US base salary range for this full-time position is: $220,000 – $270,000 equity benefits. Compensation varies by location, level, and experience.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal opportunity to all individuals regardless of race, color, ancestry, religion, sex, sexual orientation, national origin, age, citizenship, marital status, disability, gender identity, veteran status, or other protected characteristics.

Please see our privacy policy at https://www.together.ai/privacy  

 

Salary : $220,000 - $270,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Senior Research Engineer, LLM Evaluation and Behavioral Analysis?

Sign up to receive alerts about other jobs on the Senior Research Engineer, LLM Evaluation and Behavioral Analysis career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$105,809 - $128,724
Income Estimation: 
$136,611 - $163,397
Income Estimation: 
$135,163 - $163,519
Income Estimation: 
$131,953 - $159,624
Income Estimation: 
$150,859 - $181,127
Income Estimation: 
$90,032 - $105,965
Income Estimation: 
$111,859 - $131,446
Income Estimation: 
$110,457 - $133,106
Income Estimation: 
$105,809 - $128,724
Income Estimation: 
$122,763 - $145,698
This job has expired.
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at togetherai

  • togetherai San Francisco, CA
  • About the Role Together AI is looking for a Senior Data Engineer to help define, build, and operate the data infrastructure that handles millions of events... more
  • Just Posted

  • togetherai San Francisco, CA
  • About the Role As a Solutions Architect at Together AI, you will work with customers and prospects to create business value through Generative AI applicati... more
  • Just Posted

  • togetherai San Francisco, CA
  • About the Role Together AI is building the Inference Platform that brings the most advanced generative AI models to the world. Our platform powers multi-te... more
  • 1 Day Ago

  • togetherai San Francisco, CA
  • About the Role Together AI is a frontier AI cloud, which has been built bottoms up to cater to the demand for the new generation of AI applications and age... more
  • 1 Day Ago


Not the job you're looking for? Here are some other Senior Research Engineer, LLM Evaluation and Behavioral Analysis jobs in the San Francisco, CA area that may be a better fit.

  • Cohere San Francisco, CA
  • Who are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are bui... more
  • 8 Days Ago

  • Simple Solutions San Francisco, CA
  • Senior Applied LLM Engineer San Francisco, CA, USA (Onsite) Fulltime position Qualifications: Bachelor s or Master s degree in Computer Science, Engineerin... more
  • 2 Days Ago

AI Assistant is available now!

Feel free to start your new journey!