Demo

Network Reliability Engineer - Decentralized High-Performance Computing Leader

Andiamo
Seattle, WA Full Time
POSTED ON 1/7/2026
AVAILABLE BEFORE 2/13/2026
Senior / Staff Network Reliability Engineer – High-Performance Compute & AI Infrastructure

About The Role

We’re searching for an expert Network Reliability Engineer to architect, optimize, and operate the high-performance network fabrics that power large-scale AI and HPC workloads. You’ll be at the core of the engineering team responsible for building ultra-low-latency, high-throughput networks that connect thousands of GPUs and servers across global datacenters.

This isn’t a traditional networking role — it’s an opportunity to shape the performance backbone of some of the world’s most demanding compute environments. You’ll blend deep networking expertise with software engineering to deliver systems that are not only reliable and scalable but also faster and more efficient than ever before.

What You’ll Do

  • Engineer next-generation network performance: Fine-tune TCP/IP, RDMA (RoCE), kernel-bypass technologies (DPDK, XDP, eBPF), and NIC offloads to push latency and throughput to their physical limits for high-performance computing workloads.
  • Deploy and scale at massive capacity: Roll out and optimize large-scale network fabrics across datacenters using top-tier hardware (Arista, NVIDIA/Mellanox, Juniper, and more). Configure advanced BGP/EVPN topologies, spine-leaf architectures, and congestion management for lossless transport.
  • Automate network intelligence: Build telemetry pipelines and automated systems for real-time performance monitoring, packet-loss detection, and predictive congestion analysis across complex environments.
  • Debug at the deepest levels: Lead investigations into packet loss, latency anomalies, and congestion hot spots — diving into kernel traces, switch firmware, and flow control mechanisms to pinpoint and resolve issues.
  • Collaborate with the industry’s best: Work directly with hardware and silicon vendors to debug firmware, optimize RDMA and RoCE paths, validate optics, and integrate emerging technologies like 800G links and CPO/LPO networking.
  • Design for resilience and reliability: Simulate large-scale network failures, run game-day exercises, and turn lessons learned into robust automation, playbooks, and SLOs that drive measurable reliability improvements.

Who You Are

  • 7 years of experience in network engineering, SRE, or performance infrastructure roles — ideally within AI, HPC, or large-scale cloud environments.
  • Deep understanding of the Linux networking stack, including kernel-level debugging, TCP/IP, InfiniBand, and RoCE.
  • Proven hands-on experience managing multi-layer datacenter networks, network overlays (VXLAN, Geneve), and multi-vendor environments (Arista, NVIDIA/Mellanox, Juniper, etc.).
  • Strong programming proficiency in Python, Go, or Rust, and experience with Infrastructure-as-Code and modern CI/CD practices.
  • Practical knowledge of DPDK, XDP, eBPF, and hardware acceleration frameworks used in low-latency networking.
  • Demonstrated success in building and scaling high-performance, low-latency network architectures for data-intensive systems or compute clusters.

Why This Role Matters

Modern AI and high-performance computing workloads push data through networks at unprecedented speed and scale. This role sits at the intersection of innovation and reliability — where every microsecond and packet matters. As a Senior Network Reliability Engineer, you’ll design and operate the connective tissue of advanced compute infrastructure, ensuring the world’s most powerful systems run seamlessly, efficiently, and at peak performance.

About Andiamo

Andiamo is a globally recognized staffing and consulting firm specializing in placing the top 2% of technology and go-to-market professionals with the world’s largest and most well-known companies.

For over 20 years, we've maintained the status of tier-one vendor for firms such as Amazon, Bloomberg, Palantir, MasterCard, Visa, Two Sigma, Citadel, as well as other major financial services firms, elite hedge funds, Google-backed tech start-ups, and major software firms.

Our talent solutions include Permanent Placement, Contract Staffing, Executive Search, and Dedicated Recruiting Services (RPO). Find out more at www.andiamogo.com

Salary.com Estimation for Network Reliability Engineer - Decentralized High-Performance Computing Leader in Seattle, WA
$122,225 to $143,317
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Network Reliability Engineer - Decentralized High-Performance Computing Leader?

Sign up to receive alerts about other jobs on the Network Reliability Engineer - Decentralized High-Performance Computing Leader career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$108,098 - $130,480
Income Estimation: 
$131,611 - $156,576
Income Estimation: 
$76,670 - $90,826
Income Estimation: 
$91,609 - $118,978
Income Estimation: 
$92,877 - $110,401
Income Estimation: 
$92,877 - $110,401
Income Estimation: 
$120,933 - $155,034
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$140,435 - $166,410
Income Estimation: 
$151,875 - $212,356
Income Estimation: 
$169,957 - $202,398
Income Estimation: 
$169,957 - $202,398
Income Estimation: 
$151,875 - $212,356
Income Estimation: 
$120,143 - $165,703
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Andiamo

  • Andiamo York, NY
  • Senior Data Scientist About The Role Partner with product and engineering to turn data into decisions. You’ll design experiments, build predictive models, ... more
  • 5 Days Ago

  • Andiamo San Francisco, CA
  • Data Scientist About The Role When you work in this role, you’ll tackle tough problems alongside other scientists and engineers—people who will challenge y... more
  • 5 Days Ago

  • Andiamo San Francisco, CA
  • We are seeking an accomplished Principal Staff Backend Engineer to lead the design and development of scalable, secure, and resilient enterprise systems. Y... more
  • 5 Days Ago

  • Andiamo San Francisco, CA
  • Chief Technology Officer (CTO) About The Role We are seeking an exceptional Chief Technology Officer to lead the technical vision and strategy of a company... more
  • 5 Days Ago


Not the job you're looking for? Here are some other Network Reliability Engineer - Decentralized High-Performance Computing Leader jobs in the Seattle, WA area that may be a better fit.

  • Alignerr Seattle, WA
  • About The Job Alignerr connects top technical experts with leading AI labs to build, evaluate, and improve next-generation models. We work on real producti... more
  • 27 Days Ago

  • TikTok Seattle, WA
  • Responsibilities About the Team Networking brings together innovative ideas and technologies from network architecture, software-defined networking (SDN), ... more
  • 27 Days Ago

AI Assistant is available now!

Feel free to start your new journey!