What are the responsibilities and job description for the Senior Staff Infrastructure Engineer, GroqCloud position at Groq?
Mission: Design, build, and operate large-scale cloud systems to deliver the fastest inference engine in the world.
Responsibilities & Opportunities In This Role
Responsibilities & Opportunities In This Role
- Infrastructure Development: Design, build, and automate cloud infrastructure using Terraform to support a wide variety of needs.
- Service Deployment & Orchestration: Build and manage robust deployment pipelines and GitOps workflows into Kubernetes-based environments. Continuously improve CI/CD processes to facilitate rapid, reliable rollouts of new features and services, ensuring minimal downtime and maximum velocity.
- System troubleshooting: Lead investigations to determine root causes of system failures and develop scripts to repair and automate the upkeep of infrastructure components.
- Observability enhancement: Implement comprehensive monitoring (tracing, metrics, logging, alerting) to swiftly pinpoint, diagnose, and resolve system issues.
- Efficient incident response: Manage critical system incidents as a first responder, ensuring swift resolution and comprehensive post-incident analyses with implemented remediations.
- Cross Functional Collaboration: Collaborate with software engineers, platform & networking engineers, product managers and sales to enable feature delivery.
- 10 years of experience in software engineering or a related field.
- 5 years experience with GCP (especially VPC, Hybrid Networking, IAM, and GKE).
- Actively working with modern Infrastructure-as-Code technologies (Kubernetes, Terraform, Flux/ArgoCD, Kustomize, Crossplane)
- Experience with open-source monitoring tool (Prometheus, Grafana, VictoriaMetrics, VictoriaLogging and Alert Manager)
- Deep experience in cloud technologies, global scale applications, and automation.
- Familiarity with multi-region deployments, including the associated networking, latency, and failover challenges
- History of debugging production issues, mitigating, and driving efficient resolution.
- Comfortable reading, writing, and debugging software in multiple languages, especially Go and Rust.
- Thorough understanding of cloud-security best practices and modern compliance controls.
Salary : $282,100 - $331,900