Demo

Member of Technical Staff (AI Inference Engineer)

Perplexity AI
York, NY Full Time
POSTED ON 5/25/2026
AVAILABLE BEFORE 6/25/2026
We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join us.

What you will work on

Examples of real work the team does:
  • New models support. Support transformer-based retrieval, text-generation, and multimodal models in our inference infrastructure, from weight loading, request scheduling and KV-cache management to support in API Gateway.
  • GPU kernels migration to CuTe DSL. Port our in-house CUDA kernels to NVIDIA's CuTe DSL so they run on GB200 today and are portable to Vera Rubin racks tomorrow.
  • Rust-native serving runtime. Develop our internal Rust-based inference server to solve all Python pains and keep up with rapidly growing traffic.
  • Performance optimisation. Profile and fix bottlenecks from network ingress through continuous batching and GPU kernel interleaving.
  • Reliability and observability. Build dashboards, alerts, and automated remediation so we catch regressions before users do. Respond to and learn from production incidents.

Who we're looking for
  • Deep experience with GPU programming and performance work (CUDA, Triton, CUTLASS, or similar). Any other deep systems programming experience is a plus.
  • You understand modern LLM architectures and are able to bring them up reliably in a production environment.
  • You've built and operated production distributed systems under real load - ideally performance-critical ones.
  • Comfortable working across languages and layers: Rust for the serving runtime, Python for model code, CUDA/CuteDSL for kernels.
  • You own problems end-to-end. You can read a research paper on Monday, write a kernel on Wednesday, and debug a production incident on Friday.
  • Self-directed. You do well in fast-moving environments where the path forward isn't laid out for you.

Good if you touched any of
  • ML compilers and framework internals: PyTorch internals, torch.compile, custom operators.
  • Distributed GPU communication: NCCL, NVLink, InfiniBand, RDMA libraries, model/tensor parallelism.
  • Low-precision inference: INT8/FP8/FP4 quantization, mixed-precision serving.
  • Profiling and debugging tools: Nsight Compute/Systems, CUDA-GDB, PTX/SASS analysis.
  • Container orchestration: Kubernetes, GPU scheduling, autoscaling inference workloads.

Qualifications
  • 3 years of professional software engineering experience with meaningful work on ML inference or high-performance systems.
  • Familiarity with at least one deep learning framework (PyTorch, JAX, TensorFlow).
  • Understanding of GPU architectures (memory hierarchy, warp scheduling, tensor cores).
  • Understanding of common LLM architectures and inference optimization techniques (e.g. quantization, speculative decoding, prefill-decode disaggregation).

Salary : $220,000 - $485,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Member of Technical Staff (AI Inference Engineer)?

Sign up to receive alerts about other jobs on the Member of Technical Staff (AI Inference Engineer) career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$64,935 - $90,225
Income Estimation: 
$79,324 - $110,520
Income Estimation: 
$87,269 - $103,648
Income Estimation: 
$107,004 - $128,710
Income Estimation: 
$102,830 - $126,611
Income Estimation: 
$105,325 - $132,008
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Perplexity AI

  • Perplexity AI York, NY
  • Perplexity is seeking a highly experienced and hands-on Cloud Security Engineer to join our dynamic security team, revolutionizing the way people search an... more
  • 1 Day Ago

  • Perplexity AI York, NY
  • Perplexity is looking for experienced Backend Engineers to build the foundational systems behind our core products. As we move toward agentic products that... more
  • 1 Day Ago

  • Perplexity AI York, NY
  • Perplexity is looking for an Applied AI Engineer to design, build, and iterate on cutting-edge agents powering our core experience in Perplexity Computer. ... more
  • 1 Day Ago

  • Perplexity AI York, NY
  • Perplexity is redefining how people search, reason, and interact with information. Our API team sits at the core of this vision, designing and operating th... more
  • 1 Day Ago


Not the job you're looking for? Here are some other Member of Technical Staff (AI Inference Engineer) jobs in the York, NY area that may be a better fit.

  • Basis York, NY
  • About Basis Basis builds real agents that do real work in the real economy. Our agents operate for hours at a time, performing end-to-end work for some of ... more
  • 10 Days Ago

  • Cohere York, NY
  • Who are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are bui... more
  • 1 Day Ago

AI Assistant is available now!

Feel free to start your new journey!