Demo

AI/HPC System Performance Engineer

Meta
Menlo Park, CA Full Time
POSTED ON 5/17/2026
AVAILABLE BEFORE 6/15/2026
Meta is building some of the world's largest AI and high-performance computing infrastructure to power next-generation AI research and products. As an AI/HPC System Performance Engineer on the Network Infrastructure Engineering team, you will drive end-to-end performance characterization, bottleneck analysis, and optimization of large-scale AI training and inference clusters. In this role, you will work at the intersection of network fabric design, distributed computing, and AI workload behavior to ensure Meta's HPC systems deliver maximum throughput and efficiency for frontier model development.

AI/HPC System Performance Engineer Responsibilities:

  • Profile and benchmark AI training and inference workloads across large-scale HPC clusters to identify network, compute, and memory bottlenecks
  • Develop and maintain performance analysis frameworks and dashboards to track system-level metrics including GPU utilization, network bandwidth, latency, and collective communication efficiency
  • Investigate and resolve performance regressions in distributed AI training environments, including issues related to RDMA fabrics, collective communication libraries, and job scheduling
  • Collaborate with network infrastructure, hardware, and AI research teams to define performance requirements and validate new HPC cluster configurations
  • Design and execute capacity and scalability experiments to inform network topology decisions for AI supercomputing infrastructure
  • Build tooling and automation to continuously monitor HPC system health, detect anomalies, and reduce mean time to mitigation during performance incidents
  • Establish service level objectives for AI cluster network performance and drive cross-functional alignment on reliability and efficiency targets
  • Lead technical design reviews for network and system architecture changes affecting AI workload performance, communicating trade-offs clearly to engineering and product stakeholders
  • Mentor other engineers on HPC performance methodologies, debugging techniques, and instrumentation best practices
  • Leverage AI-assisted workflows to accelerate root cause analysis, automate routine performance reporting, and expand coverage across the HPC stack

Minimum Qualifications:

  • Experience profiling and optimizing distributed AI or HPC workloads, including familiarity with GPU interconnects, RDMA networking, and collective communication frameworks such as NCCL or MPI
  • Experience debugging complex, non-reproducible performance issues across multi-layer systems including network fabric, operating system, and application layers
  • Experience designing and implementing performance monitoring systems, including instrumentation, telemetry pipelines, and alerting for large-scale infrastructure
  • Experience driving cross-functional technical projects from requirements definition through production deployment, including communicating performance findings and trade-offs to diverse stakeholders
  • 6 years of experience in system performance engineering, network infrastructure engineering, or a related field within large-scale distributed computing or HPC environments

Preferred Qualifications:

  • Experience in developing systems software in languages like C
  • Experience with machine learning frameworks such as PyTorch and TensorFlow
  • Understanding of RDMA congestion control mechanisms on IB and RoCE Networks
  • Understanding of the latest artificial intelligence (AI) technologies
  • Understanding of AI training workloads and demands they exert on networks
  • Demonstrated ability to integrate AI tools to optimize/redesign workflows and drive measurable impact (e.g., efficiency gains, quality improvements)
  • Experience adhering to and implementing responsible, ethical AI practices (e.g., risk assessment, bias mitigation, quality and accuracy reviews)
  • Demonstrated ongoing AI skill development (e.g., prompt/context engineering, agent orchestration) and staying current with emerging AI technologies

About Meta:

Meta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps like Messenger, Instagram and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology. People who choose to build their careers by building with us at Meta help shape a future that will take us beyond what digital connection makes possible today—beyond the constraints of screens, the limits of distance, and even the rules of physics.

Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment.

Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@meta.com.

$154,000/year to $217,000/year bonus equity benefits

Individual compensation is determined by skills, qualifications, experience, and location. Compensation details listed in this posting reflect the base hourly rate, monthly rate, or annual salary only, and do not include bonus, equity or sales incentives, if applicable. In addition to base compensation, Meta offers benefits. Learn more about benefits at Meta.

Salary : $154,000 - $217,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Meta

  • Meta Rayville, LA
  • Meta is seeking a data center Facility Project Manager to join our Data Center Facility Operations team. Our data centers serve as the foundation upon whic... more
  • 2 Days Ago

  • Meta Altoona, IA
  • Meta is seeking an Operations & Maintenance Lead to join our Data Center Facility Operations team. Our data centers serve as the foundation upon which our ... more
  • 2 Days Ago

  • Meta Huntsville, AL
  • Meta is seeking an Operations & Maintenance Lead to join our Data Center Facility Operations team. Our data centers serve as the foundation upon which our ... more
  • 2 Days Ago

  • Meta Rosemount, MN
  • Meta is seeking an Operations & Maintenance Lead to join our Data Center Facility Operations team. Our data centers serve as the foundation upon which our ... more
  • 2 Days Ago


Not the job you're looking for? Here are some other AI/HPC System Performance Engineer jobs in the Menlo Park, CA area that may be a better fit.

  • GenBio AI Palo Alto, CA
  • Headquartered in Silicon Valley, we are a newly established start-up, where a collective of visionary scientists, engineers, and entrepreneurs are dedicate... more
  • 2 Months Ago

  • NVIDIA AI Santa Clara, CA
  • Job Requisition ID JR2018651 Job Category Engineering Time Type Full time NVIDIA has been transforming computer graphics, PC gaming, and accelerated comput... more
  • 14 Days Ago

AI Assistant is available now!

Feel free to start your new journey!