Demo

RDMA Ops Engineer - Computing Infrastructure Networking

Alibaba Cloud
Sunnyvale, CA Full Time
POSTED ON 12/24/2025
AVAILABLE BEFORE 2/2/2026

We're seeking a skilled RDMA Ops Engineer to optimize and maintain high-performance networking infrastructure for our computing clusters. This role focuses on building and operatiing ultra-low latency, high-throughput networks using RDMA technologies to power next-generation computing workloads.


Key Responsibilities:

• Deploy, operate and maintain RDMA-based network architectures (RoCE/InfiniBand) for cluster with thousands of nodes

• Optimize network performance for distributed collective communication workloads (NCCL, MPI, etc.)

• Solve complex network issues in distributed collective communication (e.g., NCCL/MPI communication bottlenecks)

• Use automation tools for network provisioning, monitoring, diagnostics,and network performance profiling (latency/throughput analysis)

• Implement CI/CD pipelines for network infrastructure-as-code

• Manage end-to-end network lifecycle: deployment, configuration, monitoring, upgrades

• Collaborate with computing algorithm engineers to troubleshoot network-related bottlenecks in training/inference pipelines

• Bridge Computing framework requirements with underlying network infrastructure capabilities

• Ensure compliance with security and scalability requirements



Minimum qualification:

- Strong scripting skills (Python/Go/Bash) for operational automation

- Expert-level RDMA operational experience (RoCEv2/InfiniBand)

- Understanding of Linux internals (kernel bypass, syscall optimization, etc),and proficient in Linux network stack tuning (irqbalance, NUMA, hugepages)

- Hands-on experience with RDMA/DPDK performance tuning

- Strong knowledge of network protocols (TCP/IP, RoCEv2) and NIC architecture principles

- Ability to abstract complex technical concepts into architectural diagrams

- Proven track record of translating R&D innovations into production solutions

- Strong communication skills for cross-functional collaboration with Computing researchers and SRE teams


Preferred qualification:

- Have experience on managing production Computing networks

- Familiar with Kubernetes networking (CNI, Multus, SR-IOV) and GPU-aware scheduling

- Background in Computing system optimization (NVIDIA collective libraries, MPI tuning)

- Deep understanding of Computing workload patterns and their network implications




The pay range for this position at commencement of employment is expected to be between $104,400 and $171,000/year. However, base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience.


If hired, employee will be in an “at-will position” and the Company reserves the right to modify base salary (as well as any other discretionary payment or compensation program) at any time, including for reasons related to individual performance, Company or individual department/team performance, and market factors.

Salary : $104,400 - $171,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a RDMA Ops Engineer - Computing Infrastructure Networking?

Sign up to receive alerts about other jobs on the RDMA Ops Engineer - Computing Infrastructure Networking career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$71,493 - $96,419
Income Estimation: 
$92,369 - $122,605
Income Estimation: 
$141,102 - $168,742
Income Estimation: 
$194,188 - $238,415
Income Estimation: 
$71,709 - $89,893
Income Estimation: 
$87,720 - $106,708
Income Estimation: 
$87,720 - $106,708
Income Estimation: 
$108,098 - $130,480
Income Estimation: 
$108,098 - $130,480
Income Estimation: 
$131,611 - $156,576
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Alibaba Cloud

  • Alibaba Cloud Sunnyvale, CA
  • The Alibaba Cloud Network Team is at the core of the Alibaba Cloud Apsara Platform, offering a rich array of network resources and solutions within the ind... more
  • 12 Days Ago

  • Alibaba Cloud Sunnyvale, CA
  • Job Description: Customer Relationship Building and Business Opportunity Development •Proactively analyze key industries within the assigned country/market... more
  • 12 Days Ago

  • Alibaba Cloud Sunnyvale, CA
  • Job Description ● Build and own relationships with AI-native companies founders, CTOs, engineers, and product leaders across the U.S. ● Understand technica... more
  • 3 Days Ago

  • Alibaba Cloud Sunnyvale, CA
  • Job Description: 1. Strategic Customer Growth & Relationship Leadership ● Own a portfolio of mid-to-large Media & Entertainment enterprises—from initial en... more
  • 3 Days Ago


Not the job you're looking for? Here are some other RDMA Ops Engineer - Computing Infrastructure Networking jobs in the Sunnyvale, CA area that may be a better fit.

  • Advanced Micro Devices, Inc Santa Clara, CA
  • WHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. O... more
  • 1 Month Ago

  • AMD and Careers Santa Clara, CA
  • Overview: WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from A... more
  • 14 Days Ago

AI Assistant is available now!

Feel free to start your new journey!