What are the responsibilities and job description for the Site Reliability Engineer position at Scale.jobs?

About The Role

The role focuses on the reliability, scalability, and performance of large-scale distributed systems. This involves balancing the development of new infrastructure features with the rigorous operational oversight required to maintain high availability across global production environments.

The team builds and manages the underlying platform that powers the developer experience. This includes automating manual processes, improving system observability, and refining incident response workflows to ensure the platform can scale alongside rapid user growth.

Key Responsibilities

Design and implement Infrastructure-as-Code (IaC) using Terraform or Pulumi to manage multi-region cloud environments on AWS or GCP
Develop and maintain Kubernetes-based container orchestration platforms, including service mesh configurations and custom controllers
Build automated observability pipelines using Prometheus, Grafana, and Jaeger to monitor system health and provide deep tracing capabilities
Drive incident response through on-call rotations, leading post-mortem analyses and implementing preventative measures to eliminate recurring failure modes
Optimize CI/CD pipelines for speed and safety, ensuring high-frequency deployments do not compromise system stability or security
Engineer automated self-healing mechanisms and scaling policies to handle unpredictable traffic spikes in production

What We Are Looking For

3–7 years of experience in SRE, DevOps, or Infrastructure Engineering roles managing high-traffic production systems
Expertise with containerization and orchestration tools, specifically Docker and Kubernetes (EKS, GKE, or self-managed)
Strong proficiency in at least one backend language used for systems automation, such as Go, Python, or Rust
Hands-on experience with cloud-native networking, including VPCs, load balancing, DNS, and CDN configurations
Deep understanding of Linux internals, performance tuning, and troubleshooting distributed systems at scale
Bonus: Experience with service mesh technologies (Istio/Linkerd), eBPF for profiling, or managing large-scale NoSQL databases

Apply for this job

Receive alerts for other Site Reliability Engineer job openings

Site Reliability Engineer

What are the responsibilities and job description for the Site Reliability Engineer position at Scale.jobs?

What is the career path for a Site Reliability Engineer?

Job openings at Scale.jobs

Not the job you're looking for? Here are some other Site Reliability Engineer jobs in the Seattle, WA area that may be a better fit.

We don't have any other Site Reliability Engineer jobs in the Seattle, WA area right now.

AI Assistant is available now!