Demo

Lead Engineer, ML Network Stack - Annapurna Labs

Amazon Web Services (AWS)
Cupertino, CA Full Time
POSTED ON 5/26/2026
AVAILABLE BEFORE 7/26/2026
Description

We are seeking an experienced engineer and technical leader to join our team that owns the network stack for EC2 distributed AI/ML systems. The team develops support for a variety of frameworks and communication libraries including NCCL, NVSHMEM, NIXL, NCCL GIN, and Perplexity kernels. Solid knowledge of Linux, networking, and performant coding is important. Experience with embedded systems is valued, and experience with high-speed networking or HPC/RDMA interconnects is highly valued.

If you like solving hard problems, want to work with HPC and ML customers, iterate fast and deliver meaningful solutions at scale, then come join us! This truly is a role at the forefront of AI/ML—you'll be working on features for the largest clusters, with the largest customers, for the largest AI models. This is a role for a technical lead with the expectation to grow into a technical manager role. We are specifically seeking candidates who want to develop their career as a technical manager.

The organization you would be joining is Annapurna Labs, an integral part of AWS that develops hardware and software components that are critical building blocks for EC2 infrastructure. Every instance in EC2 is running some type of hardware designed by Annapurna Labs. We specialize in designing software, systems, and chips that optimize the AWS customer experience.

Key job responsibilities

Be the lead engineer on a team that builds and maintains the infrastructure that monitors and reports on functionality and performance of massive testing workloads run at scale. Use internal Amazon CI/CD tools, Linux, and public AWS products to automate the delivery of our software to customers, saving developer time. Write Python code that effortlessly spools up large clusters and runs benchmarks and applications for ML and HPC workloads. Use AWS Managed Grafana and Athena to digest the massive amount of performance data generated by these workloads and create dashboards for developers and stakeholders. Invent automatic mechanisms to alert developers to functional and performance regressions so they never reach reach customers. Manage the complexity of infrastructure that covers many instance types, software stacks, Linux operating systems, cutting-edge releases and make it easy to evolve.

About The Team

The organization you would be joining is Annapurna Labs, an integral part of AWS that develops hardware and software components that are critical building blocks for EC2 infrastructure. Every instance in EC2 is running some type of hardware designed by Annapurna Labs. We specialize in designing software, systems, and chips that optimize the AWS customer experience.

Diverse Experiences

AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.

Work/Life Balance

We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.

Mentorship & Career Growth

We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.

Basic Qualifications

  • 5 years of non-internship professional software development experience
  • 5 years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • 5 years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • 3 years as a mentor, tech lead or leading engineering teams
  • 3 years experience in SW/HW Co-Design

Preferred Qualifications

  • Bachelor's degree in computer science or equivalent
  • Experience creating automated dashboards and visualization (such as Grafana)

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company’s reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.

USA, CA, Cupertino - 193,300.00 - 261,500.00 USD annually

USA, WA, Seattle - 168,100.00 - 227,400.00 USD annually


Company - Annapurna Labs (U.S.) Inc.

Job ID: A3190812

Salary : $168,100 - $261,500

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Amazon Web Services (AWS)

  • Amazon Web Services (AWS) Sparks, NV
  • Description How would you like to be a part of Earth’s most customer-centric company? You would work with teams of front-line responders who support the op... more
  • 3 Days Ago

  • Amazon Web Services (AWS) Sparks, NV
  • Description Join our dynamic AWS team and become a critical guardian of global cloud infrastructure! You'll play a pivotal role in maintaining the heartbea... more
  • 3 Days Ago

  • Amazon Web Services (AWS) Las Vegas, NV
  • DESCRIPTION The United States Air Force is at an inflection point. As it modernizes warfighting capabilities, accelerates data-driven decision-making, and ... more
  • 3 Days Ago

  • Amazon Web Services (AWS) Canton, MS
  • Description AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we’re the peop... more
  • 3 Days Ago


Not the job you're looking for? Here are some other Lead Engineer, ML Network Stack - Annapurna Labs jobs in the Cupertino, CA area that may be a better fit.

  • Annapurna Labs (U.S.) Inc. Cupertino, CA
  • DESCRIPTION AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon El... more
  • 1 Month Ago

  • Amazon Web Services (AWS) Cupertino, CA
  • Description We are hiring a hands-on Software Development Manager for the team that owns the network stack for EC2 distributed AI/ML systems. The team deve... more
  • 6 Days Ago

AI Assistant is available now!

Feel free to start your new journey!