Demo

Senior Site Reliability Engineer

Todyl
Denver, CO Full Time
POSTED ON 5/19/2026
AVAILABLE BEFORE 6/17/2026
Senior Site Reliability Engineer About the Role The Site Reliability Engineering team at Todyl exists to make our platform reliable, secure, and easy for engineering teams to ship to. We do that by building automation, self-service tooling, and operational standards that let developers move fast without putting customers at risk.

Our success is measured by how much production reliability and developer velocity we enable, not by how much work flows through us. This is a senior individual contributor role. You'll own end-to-end design and delivery of the Kubernetes-based platform initiatives that shape how Todyl runs production over the next 2–3 years, mentor and uplevel the rest of the SRE team, and operate as a peer to Architecture and Security on high-stakes platform decisions.

The team is small and rebuilding after recent transitions, and you'll work alongside our Principal SRE as one of the senior anchors of the function.

In this role, we're looking for someone who:

  • Has 5 years of Site Reliability Engineering or platform-engineering experience and has owned major platform initiatives end-to-end, from design through stabilization, staying with the work until it's truly done rather than declaring victory at deploy. They're recognized as the go-to person in their technical domain and create design documentation that their teams reference long after the work ships.
  • Mentors less-tenured engineers as a matter of practice. They grow the people around them through pairing, design partnership, and the example they set. * Sees SRE as a service to the engineering organization, not a gate. They build trust with developers and make other teams' jobs easier.
  • Treats security as a normal part of operating the platform, not an afterthought, and brings demonstrated experience designing systems with security as a first-class concern.
  • Gets energized by eliminating toil and looks at repetitive work and asks, "How do we make this go away?"
  • Actively uses AI tooling in their day-to-day work, and influences how the team adopts AI patterns safely.
  • Can communicate technical decisions clearly to engineers, engineering leadership, and non-engineering stakeholders, and is comfortable saying no or pushing back constructively when it matters.

What you'll do:

  • Own end-to-end design and delivery of flagship platform initiatives, designing for failure modes, graceful degradation, and the scale we expect 12 months from now rather than just today. The headline 12–18 month deliverable for this role is the golden-path platform: a developer-facing self-service path to production that enforces infrastructure best practices without requiring SRE involvement.
  • Drive security automation at platform scale, including patching cadence, secret rotation, access controls, and CVE remediation, as ongoing operational practices rather than reactive sprints.
  • Partner with product engineering teams at the architecture phase of high-stakes systems, helping shape the design rather than reviewing it the week before launch.
  • Operate as a peer to Architecture and Security on platform decisions that affect how Todyl runs production over the next 2–3 years.
  • Mentor less-tenured SREs through pairing, code review, and design partnership, with measurable improvement in their autonomy on design and incident work.
  • Contribute to one or more SRE practice improvements adopted by the team: incident commander discipline, postmortem maturity, change management standards, on-call quality, or design review cadence.
  • Build and operate the production platform: Kubernetes with Helm and ArgoCD, CI/CD pipelines, infrastructure-as-code (Terraform, Salt), observability (Grafana, Prometheus), secrets management, and AWS (including EKS). We're shifting from reactive to proactive, and we'd rather build guardrails than approve every deploy.
  • Drive cost visibility and efficiency across our cloud footprint, including AWS resource tagging, COGs attribution, and right-sizing across the platform, and you'll quantify the business impact in terms that leadership can act on.
  • Participate in a weekly on-call rotation, resolve most issues independently, and own postmortems and follow-up actions for the incidents you respond to.
  • Plan and estimate honestly, break multi-quarter work into smaller increments, communicate delays early, and write tests for the automation you build because it runs in production.
  • Treat code review as a quality lever, not a checkbox. Catch missing tests, push back on tech debt, and watch dashboards and logs to verify your own changes after they ship.
  • When something you've built is mature and stable, you'll look for ways to hand it off or make it self-managing rather than holding onto it forever.

Important note: We expect the person in this role to actively use AI tools, including tools like Claude, to accelerate automation development, reduce toil, and solve infrastructure problems more quickly. At the senior level, we also expect you to influence how the team adopts AI tooling: sharing patterns that work, flagging patterns that don't, and helping the team integrate AI safely into review, incident response, and automation workflows.

As part of our interview process, you'll work through a live AI-paired exercise with a couple of our engineers to see how you approach a real platform problem together.

We don't expect deep knowledge across every item below, but familiarity with several of these will help you ramp quickly.

Most importantly, we're looking for a strong technical background, the willingness to learn what you don't already know, and demonstrated experience operating production platforms at meaningful scale.

  • Kubernetes (EKS), Helm, ArgoCD, containerization
  • AWS (including EKS, ECR, and IAM) and cloud-native infrastructure
  • Infrastructure-as-code (Terraform, Salt) * CI/CD pipelines and GitOps (GitHub Actions, ArgoCD)
  • Observability stack (Grafana, Prometheus)
  • Linux at scale
  • Python or Bash for tooling
  • Networking fundamentals
  • Security-conscious infrastructure design (patching, secrets management, access controls)
  • Git and modern development workflows

Compensation Range: $165K - $185K

Salary : $165,000 - $185,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Senior Site Reliability Engineer?

Sign up to receive alerts about other jobs on the Senior Site Reliability Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$144,264 - $191,312
Income Estimation: 
$140,435 - $166,410
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$144,264 - $191,312
Income Estimation: 
$140,435 - $166,410
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Todyl

  • Todyl Atlanta, GA
  • About The Role At Todyl, our Application Platform Engineering team is dedicated to building infrastructure, services and patterns that enable our applicati... more
  • 15 Days Ago

  • Todyl Denver, CO
  • About Us At Todyl, we are on a mission to protect small and medium-sized businesses from ever-changing cyber threats. The Todyl platform fully integrates t... more
  • 5 Days Ago

  • Todyl Denver, CO
  • About The Role At Todyl, our Application Platform Engineering team is dedicated to building infrastructure, services and patterns that enable our applicati... more
  • 5 Days Ago

  • Todyl Atlanta, GA
  • Senior Site Reliability Engineer About the Role The Site Reliability Engineering team at Todyl exists to make our platform reliable, secure, and easy for e... more
  • 5 Days Ago


Not the job you're looking for? Here are some other Senior Site Reliability Engineer jobs in the Denver, CO area that may be a better fit.

  • 631 Booz Allen Hamilton_United States Aurora, CO
  • Site Reliability Engineer, Senior The Opportunity: Engineering to make a system more resilient and efficient frees up time and money to build more capabili... more
  • 10 Days Ago

  • Fivetran Denver, CO
  • From Fivetran’s founding until now, our mission has remained the same: to make access to data as simple and reliable as electricity. With Fivetran, custome... more
  • 23 Days Ago

AI Assistant is available now!

Feel free to start your new journey!