Demo

Sr. Site Reliability Engineer

Practice by Numbers
Redmond, WA Full Time
POSTED ON 4/1/2026
AVAILABLE BEFORE 4/29/2026
This is an engineering-first Senior SRE role.

We’re Looking For Senior Engineers Who Have

  • Built and shipped significant backend systems and/or distributed platforms
  • Owned services end-to-end in production (design → launch → on-call → reliability improvements)
  • Led incident response and driven durable follow-ups
  • Improved reliability by writing software and changing system design—not by adding manual process

You’ll partner closely with product engineering to ensure reliability is designed in from day one, while also building the tooling and platforms that make operating services safer and easier for every engineer.

Engineers here own services end-to-end—from design to production reliability.

Important: This is not a system administrator role. We are explicitly hiring an engineering leader in reliability.Engineering degree is an absolute requirement (BS/MS in CS/CE/EE or closely related engineering field).

What You’ll Do

  • Own reliability outcomes for critical services: availability, latency, incident rate, and recovery time.
  • Design and build reliable, scalable distributed systems that support mission-critical healthcare workflows.
  • Define and operationalize SLOs/SLIs and error budgets; drive adoption across teams and use them to prioritize work.
  • Lead incident response for high-severity issues; improve on-call effectiveness and reduce alert fatigue.
  • Run blameless postmortems and ensure follow-ups are implemented, measured, and stick.
  • Write software to eliminate operational toil: automation, self-service tooling, guardrails, and developer platforms.
  • Raise the bar on observability (metrics/logs/traces), alerting strategy, and operational readiness.
  • Improve resilience through capacity planning, load testing, performance tuning, and failure testing.
  • Mentor engineers (SRE and product engineers) on reliability practices, debugging, and production ownership.
  • Drive cross-team improvements like production readiness reviews, release safety (progressive delivery), and standard runbooks.

What We’re Looking For

Required

  • Engineering degree is mandatory: BS/MS in Computer Science, Computer Engineering, Electrical Engineering, or a closely related engineering field.
  • 6 years experience in software engineering, SRE, infrastructure/platform engineering, or related.
  • Strong programming skills in Go, Python, Java, or similar (production-quality code).
  • Proven experience building and operating production backend services or distributed systems.
  • Meaningful experience in on-call rotations, incident leadership, and post-incident improvement execution.
  • Strong debugging ability across complex systems: latency, saturation, cascading failures, dependency issues.
  • Experience with cloud infrastructure (AWS preferred, GCP/Azure acceptable).

Strong Signal

  • You’ve owned reliability for customer-facing services with clear, measurable improvements (e.g., higher availability, lower MTTR).
  • You’ve built internal platforms/tooling that made other engineers faster and reduced operational burden.
  • You’ve worked in an SRE culture with SLOs, error budgets, and blameless postmortems.
  • You’ve led multi-quarter reliability initiatives spanning multiple teams/services.

Technologies We Work With (Examples)

  • Cloud: AWS
  • Containers: Docker, Kubernetes
  • Infrastructure as Code: Terraform
  • Observability: Prometheus, Grafana, OpenTelemetry
  • Languages: Go, Python, TypeScript
  • CI/CD: GitHub Actions

(Experience with everything isn’t required—strong fundamentals and learning velocity matter most.)

What This Role Is Not

To Be Explicit, This Role Is Not

  • System administration / IT ops / helpdesk
  • Manual server patching as a primary responsibility
  • A “click-ops” cloud operator role

This is a senior engineering role focused on software-driven reliability and platform engineering.

Why Join PBN

  • Build and operate mission-critical healthcare infrastructure that supports real patient workflows.
  • High impact: reliability work directly improves customer trust and revenue-critical operations.
  • Small team with high ownership, autonomy, and ability to influence architecture.
  • Strong engineering culture focused on automation, simplicity, and measurable outcomes.

Compensation

The base pay range for this role is $120,000 – $150,000 per year.

Salary : $120,000 - $150,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Sr. Site Reliability Engineer?

Sign up to receive alerts about other jobs on the Sr. Site Reliability Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$144,264 - $191,312
Income Estimation: 
$140,435 - $166,410
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Not the job you're looking for? Here are some other Sr. Site Reliability Engineer jobs in the Redmond, WA area that may be a better fit.

  • T-Mobile Bothell, WA
  • At T-Mobile, we invest in YOU! Our Total Rewards Package ensures that employees get the same big love we give our customers. All team members receive a com... more
  • 1 Day Ago

  • SpaceX Redmond, WA
  • SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today... more
  • 15 Days Ago

AI Assistant is available now!

Feel free to start your new journey!