What are the responsibilities and job description for the Manager, Site Reliability Engineering and Incident Management position at Planet DDS?

Planet DDS is a leading provider of a platform of cloud-based solutions that empowers growth-minded dental businesses. Now serving over 13,000 practices and 118,000 customers in North America, Planet DDS delivers a comprehensive suite of solutions, including Denticon Practice Management, Cloud 9 Ortho Practice Management, and Apteryx Cloud Imaging. Planet DDS is dedicated to enabling dental support organizations (DSOs) and groups to grow and thrive with technology that delivers seamless integrations, improved workflows, and future-proof scalability.

We are seeking a Manager, Site Reliability Engineering and Incident Management, to manage our Site Reliability Engineering function as well as our external incident response function for our production operations. To be successful, the manager will need to be self-motivated, communicate clearly, and operate with a sense of urgency in a fast-paced environment. Providing operational support means that you will leverage your customer empathy to production incidents and to any other internal engineering-related support requests. It will be crucial for you to gain a deep understanding of our systems and architecture and build a hands-on knowledge of support and observability tooling. You will need to be available to engage in any incident escalations 24x7. You will need to seek answers from subject matter experts in a variety of positions from architects to support staff, business leaders, and technically minded developers.

Location: East Coast (US)

Job Duties

Team Leadership & Development

Lead and mentor a team of SREs and Incident Managers.
Foster a culture of reliability, accountability, and continuous improvement.
Collaborate with engineering teams to design resilient platform architectures.

Incident Management

Oversee the incident response process for outages and service disruptions.
Ensure timely detection, escalation, and resolution of incidents.
Drive post-incident reviews (PIRs) and root cause analysis.
Implement improvements based on lessons learned to prevent recurrence.

Operational Excellence

Mature and enforce best practices for incident response and runbooks.
Automate operational tasks to reduce toil and improve efficiency.
Maintain observability tools (monitoring, alerting, logging).

Process & Governance

Define and maintain incident management policies and escalation procedures.
Drive initiatives for chaos engineering, capacity planning, and disaster recovery testing.

Skills And Qualifications

7 years in SRE, DevOps, or Infrastructure roles.
3 years in Incident Management leadership.
Deep understanding of reliability, scalability, and performance optimization.
Multi-cloud expertise in AWS, Azure, or GCP.
Understanding of DNS, load balancing, firewalls, and compliance frameworks.
Security is part of everything we do and will require your knowledge of fundamental cloud security (e.g., identity and access management, firewalls, etc.) 
Deep understanding of logging and monitoring and security best practices  
Strong collaboration and communication skills 
Bachelor’s Degree in a relevant major or equivalent years of experience is a plus

Any of the following would be a plus: 

Dental industry knowledge 
Experience working in B2B SaaS companies 
Experience with cloud containers, specifically Kubernetes

PLANET DDS CORE IDEOLOGY

Why are we here?

Dental software is broken. We aim to fix it.

Where are we headed?

To be the first choice for growth-minded dental businesses.

How do we get there?

To Encourage Measurable Progress Toward Our Vision And Make The Best Decisions On Behalf Of Employees And Customers, We Adopted a Set Of Common Values

Collaborative – Working independently and across teams, we create scalable solutions to enable company growth
Empathetic – We are educated on the experience of our customers and feel vested in their success
Accountable – We feel ownership for the quality of our work and take pride in the positive outcomes
Trustworthy – We operate with integrity and honest, making promises we know that we can keep
Ambitious – We are driven by our ability to make a long-term, positive impact on the lives of dental market leaders

Planet DDS is an Equal Opportunity Employer – Including Disability/Veterans

Apply for this job

Receive alerts for other Manager, Site Reliability Engineering and Incident Management job openings

Manager, Site Reliability Engineering and Incident Management

What are the responsibilities and job description for the Manager, Site Reliability Engineering and Incident Management position at Planet DDS?

Job openings at Planet DDS

Not the job you're looking for? Here are some other Manager, Site Reliability Engineering and Incident Management jobs in the Atlanta, GA area that may be a better fit.

We don't have any other Manager, Site Reliability Engineering and Incident Management jobs in the Atlanta, GA area right now.

AI Assistant is available now!