Demo

Member of Technical Staff - ML Systems & Inference

Gimlet Labs, Inc.
San Francisco, CA Full Time
POSTED ON 5/11/2026
AVAILABLE BEFORE 6/7/2026
About Us

Gimlet Labs is building the first heterogeneous neocloud for AI workloads.

As AI systems scale, the industry is hitting fundamental limits in power, capacity, and cost with today’s homogeneous, vertically integrated infrastructure. Gimlet addresses this by decoupling AI workloads from the underlying hardware. Our platform intelligently partitions workloads into components and orchestrates each component to hardware that best fits its performance and efficiency needs. This approach enables heterogeneous systems across multi-vendor and multi-generation hardware, including the latest emerging accelerators. These systems unlock step-function improvements in performance and cost efficiency at scale.

On top of this foundation, Gimlet is building a production-grade neocloud for agentic workloads. Customers use Gimlet to deploy and manage their workloads through stable, production-ready APIs, without having to reason about hardware selection, placement, or low-level performance optimization.

Gimlet works with foundation labs, hyperscalers, and AI native companies to power real production workloads built to scale to gigawatt-class AI datacenters.

Mission

Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design and build the inference systems that execute full models end-to-end under real production constraints. You will work at the intersection of model architecture, runtime behavior, and system performance to ensure inference is fast, predictable, and scalable.

This role is ideal for engineers who deeply understand how modern models execute in practice and who care about latency, throughput, and memory behavior across the full inference lifecycle.

Responsibilities

  • Design and optimize end-to-end inference pipelines from request ingestion through execution and response
  • Build and evolve inference runtimes that balance latency, throughput, and concurrency under real-world load
  • Reason about batching, queuing, and scheduling tradeoffs, including their impact on tail latency and fairness
  • Manage KV cache allocation, placement, reuse, and eviction across models and requests
  • Optimize prefill and decode paths, including attention mechanisms and memory usage
  • Profile and debug inference performance issues across model, runtime, and system boundaries
  • Work closely with compilers, kernels, networking, and distributed systems to deliver end-to-end performance improvements

Qualifications

  • Strong software engineering fundamentals
  • Experience building or operating ML inference or model serving systems
  • Comfort reasoning about performance, memory usage, and system behavior under load

Preferred Qualifications

  • Experience with inference runtimes such as TensorRT-LLM, vLLM, or custom serving systems
  • Deep understanding of modern model architectures and attention mechanisms
  • Experience with batching, scheduling, and concurrency control in inference systems
  • Familiarity with KV cache management and memory placement strategies
  • Experience profiling and tuning latency- and throughput-critical systems
  • Software development experience in Python and C

Salary.com Estimation for Member of Technical Staff - ML Systems & Inference in San Francisco, CA
$71,942 to $87,826
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Member of Technical Staff - ML Systems & Inference?

Sign up to receive alerts about other jobs on the Member of Technical Staff - ML Systems & Inference career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$36,436 - $44,219
Income Estimation: 
$50,145 - $86,059
Income Estimation: 
$48,515 - $60,705
Income Estimation: 
$89,966 - $112,616
Income Estimation: 
$118,163 - $145,996
Income Estimation: 
$120,777 - $151,022
Income Estimation: 
$129,363 - $167,316
Income Estimation: 
$86,891 - $130,303
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Gimlet Labs, Inc.

  • Gimlet Labs, Inc. San Francisco, CA
  • About Us Gimlet Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is hitting fundamental limits in powe... more
  • 14 Days Ago

  • Gimlet Labs, Inc. San Francisco, CA
  • About Us Gimlet Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is hitting fundamental limits in powe... more
  • 14 Days Ago

  • Gimlet Labs, Inc. San Francisco, CA
  • About Us Gimlet Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is hitting fundamental limits in powe... more
  • 14 Days Ago

  • Gimlet Labs, Inc. San Francisco, CA
  • About Us Gimlet Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is hitting fundamental limits in powe... more
  • 14 Days Ago


Not the job you're looking for? Here are some other Member of Technical Staff - ML Systems & Inference jobs in the San Francisco, CA area that may be a better fit.

  • Magic San Francisco, CA
  • Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to saf... more
  • 10 Days Ago

  • Prime Intellect San Francisco, CA
  • Building Open Superintelligence Infrastructure Prime Intellect is building the open superintelligence stack - from frontier agentic models to the infra tha... more
  • 14 Days Ago

AI Assistant is available now!

Feel free to start your new journey!