Demo

Platform Site Reliability Engineer II

Todyl
Atlanta, GA Full Time
POSTED ON 4/14/2026
AVAILABLE BEFORE 5/10/2026
About The Role

At Todyl, our Application Platform Engineering team is dedicated to building infrastructure, services and patterns that enable our application development teams to quickly and safely deploy services at the core of our security offering. As a member of this innovative team, you will play a pivotal role in designing and engineering cutting-edge solutions that are highly performant, highly resilient and low maintenance. Your work will not only directly impact the reliability and security of our platform but also empower the engineering team to continuously push the boundaries of what's possible in the security space.

Responsibilities:

  • As a Platform SRE (Site Reliability Engineer) at Todyl, you will be responsible for developing tools and services that support Todyl's application hosting infrastructure, including but not limited to Kubernetes-based environments.
  • Build automation to improve reliability and reduce human interaction for Day 2 Operations, with an emphasis on infrastructure-as-code practices.
  • Implement and enforce security policies, access controls, and system patching — treating security hygiene as a first-class operational responsibility rather than an afterthought.
  • Own attack surface management for production infrastructure: identify exposure, prioritize remediation, and drive CVE resolution to completion rather than leaving findings unactioned.
  • Operationalize security tooling by building integrations, establishing remediation workflows, and ensuring findings are consistently acted upon.
  • Own features and services through deployment and stabilization — work isn't done until it's stable in production and documented.
  • Collaborate with product and engineering teams to deliver solutions that meet the needs of stakeholders and the business.Improve application monitoring and alerting to minimize time to detect and time to restore; review dashboards and logs to verify deployments succeeded.
  • Identify and drive cost-optimization opportunities, including resource labeling, right-sizing, and efficiency improvements, to reduce COGs.
  • Participate in a weekly on-call rotation, resolve most issues independently, and update runbooks and documentation after incidents.

Requirements

  • MUST HAVE: Experience managing Kubernetes and applications running on Kubernetes.
  • MUST HAVE: General competency in one or more scripting or programming languages, including Python or Bash.
  • MUST HAVE: Demonstrated experience identifying and remediating vulnerabilities in production infrastructure, including CVE triage and remediation workflows.
  • Experience managing production Linux systems at scale.
  • Working knowledge of REST APIs.
  • Familiarity with networking fundamentals and common attack surface concepts (exposed services, misconfigured access controls, unpatched dependencies).
  • Comfort with cloud security tooling and the ability to operationalize findings into actionable remediation work.
  • Comfort with cloud cost management concepts, including resource tagging and cost attribution strategies.
  • Breaks work into incremental deliverables; communicates delays early and tracks progress against estimates.
  • Writes and maintains tests for the automation and tooling you build; proactively considers edge cases and failure conditions.
  • Ability to quickly learn new concepts, frameworks, and technologies, including AI-assisted tools to accelerate development and reduce toil.
  • Comfortable building and maintaining production services with a strong sense of ownership from build through stabilization.
  • Production experience using CI/CD for code deployment.
  • Experience with on-call rotations and incident response processes.

What we offer

  • Health & Wellbeing
    • Medical, dental, and vision coverage for you and your family
    • HSA/FSA options
    • Life insurance and short- and long-term disability coverage
  • Financial & Future
    • Competitive 401(k) to invest in your future
    • Short- and long-term disability coverage for when life gets unpredictable
  • Flexibility & Time Off
    • Hybrid work schedule
    • Flexible PTO 13 company holidays
    • Generous parental leave
Todyl provides equal employment opportunities to all employees and applicants for employment without regard to race, color, religion, gender, sexual orientation, transgender status, gender identity or expression, national origin, age, disability, marital status, genetic information, military status or any other status protected by applicable federal, state or local laws.

Compensation Range: $130K - $160K

Salary : $130,000 - $160,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Platform Site Reliability Engineer II?

Sign up to receive alerts about other jobs on the Platform Site Reliability Engineer II career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$92,877 - $110,401
Income Estimation: 
$120,933 - $155,034
Income Estimation: 
$114,618 - $136,401
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Todyl

  • Todyl Augusta, GA
  • Senior Detection and Response Analyst About Us At Todyl, we are on a mission to protect small and medium-sized businesses from ever-changing cyber threats.... more
  • 11 Days Ago

  • Todyl Denver, CO
  • About The Role At Todyl, our Application Platform Engineering team is dedicated to building infrastructure, services and patterns that enable our applicati... more
  • 14 Days Ago


Not the job you're looking for? Here are some other Platform Site Reliability Engineer II jobs in the Atlanta, GA area that may be a better fit.

  • Cox Automotive Inc. Atlanta, GA
  • The Site Reliability Engineer II will be part of the Site Reliability Engineering (SRE) team. The SRE team drives reliability, observability, and engineeri... more
  • 4 Days Ago

  • InComm Payments Atlanta, GA
  • Overview When you think of InComm Payments, think of Innovative Payments Technology. We were founded over 30 years ago and continue to be a pioneer in the ... more
  • 1 Month Ago

AI Assistant is available now!

Feel free to start your new journey!