Demo

Member of Technical Staff (Research Engineer - LLM Systems & Performance)

Contextual AI
Mountain View, CA Full Time
POSTED ON 12/31/2025
AVAILABLE BEFORE 2/6/2026
About Contextual AI

We're revolutionizing how AI Agents work by solving AI's most critical challenge: context. The right context at the right time unlocks the accuracy and production scale that enterprises leveraging AI require. Our enterprise AI development platform sits at the intersection of breakthrough AI research and practical developer needs. Our end-to-end platform allows AI developers to easily and accurately ingest and query documents from enterprise data sources and easily embed retrieval results into their business workflows.

Contextual AI was founded by the pioneers of Retrieval-Augmented Generation (RAG), the foundational technique behind the context layer, connecting foundation models to current and relevant information. Backed by the industry's most forward-thinking venture capitalists, we're not just participating in the enterprise AI revolution, we're defining it. Join us in building a future where AI doesn't just answer questions, it transforms businesses.

About the role

As a a Member of Technical Staff specializing in Research Engineer – LLM Systems & Performance, you will be part of a small, high-impact team building and optimizing LLM systems end-to-end, from Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) pipelines to high-throughput Inference clusters in production. You will collaborate closely with researchers and engineers to develop advanced models and infrastructure for the context layer.

What you'll do

  • Implement and improve components of our SFT and RL training pipelines (e.g., Verl, SkyRL), including data loading, training loops, logging, and evaluation.
  • Contribute to LLM inference infrastructure (e.g., vLLM, SGLang), including batching, KV-cache management, scheduling, and serving optimizations.
  • Profile and optimize end-to-end performance (throughput, latency, compute/memory/bandwidth), using tools like Nsight and profilers to identify and fix bottlenecks.
  • Work with distributed training and inference setups using NCCL, NVLink, and data/tensor/pipeline/expert/context parallelism on multi-GPU clusters.
  • Help experiment with and productionize quantization (e.g., INT8, FP8, FP4, mixed-precision) for both training and inference.
  • Write and optimize GPU kernels using tools like CUDA or Triton, and leverage techniques such as FlashAttention and Tensor Cores where appropriate.
  • Collaborate with researchers to take ideas from paper → prototype → scaled experiments → production.
  • Write clean, well-tested, and well-documented code that can be shared across multiple teams (Research, Platform and Products).

What we're seeking

  • Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related technical field (or equivalent practical experience).
  • Strong programming skills in Python.
  • Experience with at least one major ML framework: PyTorch or JAX.
  • Solid understanding of GPU computing fundamentals (threads/warps/blocks, memory hierarchy, bandwidth vs compute, etc.).
  • Familiarity with distributed training or inference concepts (e.g., model parallelism, collective communication, disaggregated serving, KV caching).
  • Interest in performance engineering: profiling, kernel fusion, memory layout, and end-to-end system efficiency.
  • Ability to work in a fast-paced environment, communicate clearly, and collaborate closely with other engineers and researchers.

Location: Mountain View, CA.

Salary Range for California Based Applicants: $170,000 - $200,000 equity benefits (actual compensation will be determined based on experience, location, and other factors permitted by law).

Equal Opportunity

Contextual AI is an equal opportunity employer and complies with all applicable federal, state, and local fair employment practices laws. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, sex, sexual orientation, gender, gender expression, gender identity, genetic information or characteristics, physical or mental disability, marital/domestic partner status, age, military/veteran status, medical condition, or any other characteristic protected by law.

Salary : $170,000 - $200,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Member of Technical Staff (Research Engineer - LLM Systems & Performance)?

Sign up to receive alerts about other jobs on the Member of Technical Staff (Research Engineer - LLM Systems & Performance) career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$64,935 - $90,225
Income Estimation: 
$79,324 - $110,520
Income Estimation: 
$90,032 - $105,965
Income Estimation: 
$111,859 - $131,446
Income Estimation: 
$110,457 - $133,106
Income Estimation: 
$105,809 - $128,724
Income Estimation: 
$122,763 - $145,698
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Contextual AI

  • Contextual AI Mountain View, CA
  • About Contextual AI We're revolutionizing how AI Agents work by solving AI's most critical challenge: context. The right context at the right time unlocks ... more
  • 15 Days Ago

  • Contextual AI Mountain View, CA
  • About Contextual AI We're revolutionizing how AI Agents work by solving AI's most critical challenge: context. The right context at the right time unlocks ... more
  • 15 Days Ago

  • Contextual AI Mountain View, CA
  • About Contextual AI We're revolutionizing how AI Agents work by solving AI's most critical challenge: context. The right context at the right time unlocks ... more
  • 8 Days Ago

  • Contextual AI Mountain View, CA
  • About Contextual AI We're revolutionizing how AI Agents work by solving AI's most critical challenge: context. The right context at the right time unlocks ... more
  • 1 Day Ago


Not the job you're looking for? Here are some other Member of Technical Staff (Research Engineer - LLM Systems & Performance) jobs in the Mountain View, CA area that may be a better fit.

  • Architect Labs Palo Alto, CA
  • About Architect Architect is an AI research and product lab for chip design. We build AI models and systems that can explore, design, optimize, and verify ... more
  • 1 Month Ago

  • Oracle Pleasanton, CA
  • Job Description Analyze, design develop, troubleshoot and debug software programs for commercial or end user applications. Writes code, completes programmi... more
  • 14 Days Ago

AI Assistant is available now!

Feel free to start your new journey!