What are the responsibilities and job description for the Senior Staff Infrastructure AI/ML RDMA RoCEv2 Engineer position at Tara Technical Solutions (TTS)?
Company Description
Tara Technical Solutions (TTS)
Is the Authorized Venfor for our Fortune 500 Client.
We are represting full-time-direct hires only.
Role Description
This is a full-time, on-site role located in San Jose, CA, for a Senior Staff Infrastructure AI/ML RDMA RoCEv2 Engineer. The responsibilities include designing, implementing, and optimizing RDMA (Remote Direct Memory Access) and RoCEv2 (RDMA over Converged Ethernet version 2) solutions for AI/ML infrastructure. The engineer will analyze system performance, troubleshoot issues, and collaborate with cross-functional teams to enhance scalability and efficiency. Additional responsibilities involve developing and deploying high-performance networking solutions to meet complex AI/ML workload demands.
Qualifications
- Extensive experience with RDMA technologies and protocols, such as RoCEv2 and InfiniBand
- Proficiency in programming languages including C, C , and Python
- Strong expertise in AI/ML workflows and scalable distributed systems
- Familiarity with HPC (High Performance Computing), GPUDirect technologies, or networking frameworks is a plus
- Experience designing and debugging low-latency, high-bandwidth data communication frameworks
- Excellent problem-solving skills and the ability to work collaboratively in cross-functional teams
- Master's or Ph.D. in Computer Science, Electrical Engineering, or related field, or equivalent work experience