What are the responsibilities and job description for the Senior Software Engineer - Together Cloud Infrastructure position at Jobright.ai?
Verified Job On Employer Career Site
Job Summary:
Together AI is a research-driven artificial intelligence company focused on building the AI Acceleration Cloud, an end-to-end platform for the generative AI lifecycle. The Senior AI Infrastructure Engineer will be responsible for developing a highly available cloud infrastructure that supports various AI services and ensures robust performance across global data centers.
Responsibilities:
• Perform architecture and research work for decentralized AI workloads
• Work on the core, open-source Together AI platform
• Create services, tools, and developer documentation
• Create testing frameworks for robustness and fault-tolerance
Qualifications:
Required:
• 5 years of professional software development experience and proficiency in at least one backend programming language (Golang desired)
• 5 years experience writing high-performance, well-tested, production quality code
• Demonstrated experience with building and operating high-performance and/or globally distributed micro-service architectures across one or more cloud providers (AWS, Azure, GCP)
• Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members
• Strong systems knowledge across compute, networking, and storage, including concurrency, memory management, performant I/O, and scale
• Experience with infrastructure automation tools (Terraform, Ansible), monitoring/observability stacks (Prometheus, Grafana), and CI/CD pipelines (GitHub Actions, ArgoCD)
Preferred:
• Deep experience with Kubernetes internals a big plus, such as implementing non-trivial Kubernetes operators, device/storage/network plugins, custom schedulers, or patches thereon or Kubernetes itself
• Deep experience with VMs/hypervisors a big plus, such as QEMU/KVM, cloud-hypervisor, VFIO, virtio, PCIE passthrough, Kubevirt, SR-IOV
• Deep experience with DC networking tech solutions a big plus, such as VLAN, VXLAN, VPN, VPC, OVS/OVN
• Experience with Cluster API or similar a big plus
• Experience working on high-performance compute, networking, and/or storage a big plus
• Experience virtualizing GPUs and/or Infiniband a big plus
• Experience building IaaS or PaaS systems at scale a plus
• Experience with DPUs/SmartNICs a plus
• GPU programming, NCCL, CUDA knowledge a plus
Company:
Together AI is a cloud-based platform designed for constructing open-source generative AI and infrastructure for developing AI models. Founded in 2022, the company is headquartered in San Francisco, California, USA, with a team of 201-500 employees. The company is currently Growth Stage. Together AI has a track record of offering H1B sponsorships.