What are the responsibilities and job description for the Staff Infrastructure Engineer position at Groq?
Mission
At Groq, we are building a custom cloud from the ground up - one data center at a time. Our Compute Storage team owns the systems that turn racks of bare metal into production-ready Kubernetes clusters powering the next generation of AI workloads.
We are looking for a Staff Infrastructure Engineer to help us scale this effort. This is a hands-on role focused on fully automating deployment and lifecycle management of the Groq Cloud server fleet. You will work closely with DC, network and platform teams to define and develop tools and automation that enable seamless deployment and management of Groq compute nodes and storage clusters. We're looking for someone passionate about infrastructure who enjoys debugging close to the metal. If you're eager to grow your skills in deploying, scaling, and optimizing bare metal to support complex distributed HPC in the expanding inference market – we would love to talk.
Responsibilities & Opportunities In This Role
At Groq, we are building a custom cloud from the ground up - one data center at a time. Our Compute Storage team owns the systems that turn racks of bare metal into production-ready Kubernetes clusters powering the next generation of AI workloads.
We are looking for a Staff Infrastructure Engineer to help us scale this effort. This is a hands-on role focused on fully automating deployment and lifecycle management of the Groq Cloud server fleet. You will work closely with DC, network and platform teams to define and develop tools and automation that enable seamless deployment and management of Groq compute nodes and storage clusters. We're looking for someone passionate about infrastructure who enjoys debugging close to the metal. If you're eager to grow your skills in deploying, scaling, and optimizing bare metal to support complex distributed HPC in the expanding inference market – we would love to talk.
Responsibilities & Opportunities In This Role
- Develop robust, scalable automation solutions (Go, Python, Bash) to streamline and standardize deployment workflows across global data center environments.
- Be part of large cross-functional collaboration with data center operations, networking, and platform teams, ensuring infrastructure is fully integrated and production-ready.
- Develop automation to ensure all production machines and clusters consistently meet optimal health standards in a timely manner.
- Define best practices and standards for infrastructure-as-code and configuration management using Git, Flux, Terraform, and related tools.
- Set technical direction and maintain high-quality system documentation, operational runbooks, and internal tooling that improve the resilience, repeatability, and observability of the infrastructure stack.
- Experience with deploying and supporting Linux / Kubernetes systems at scale.
- Familiarity with infrastructure-as-code and Git-based workflows (e.g., Terraform, Flux, Kustomize).
- Ability to write and maintain basic tooling in common modern languages such as Go and Python.
- Understanding of networking fundamentals (IPAM, VLANs, DHCP, DNS).
- Working knowledge of storage concepts (block vs object, NFS, RAID, etc.).
- Strong sense of ownership and a willingness to work through ambiguity.
- Experience provisioning physical machines in a data center environment.
- Exposure to Talos Linux, Kubernetes bootstrapping, or Kubernetes platform engineering.
- Previous collaboration with facilities, hardware, or network teams in an operational role.
- Humility - Egos are checked at the door
- Collaborative & Team Savvy - We make up the smartest person in the room, together
- Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
- Curious & Innovative - Take a creative approach to projects, problems, and design
- Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking
Salary : $132,100 - $279,800
Staff Database Infrastructure Engineer
Calix -
San Jose, CA
Staff Infrastructure Security Engineer
LinkedIn -
Sunnyvale, CA
Staff Software Infrastructure Engineer
archer56 -
San Jose, CA