What are the responsibilities and job description for the Site Reliability Engineering Manager position at TBG | The Bachrach Group?

A leading global investment management firm is seeking an exceptional SRE Manager to lead and transform their Site Reliability Engineering function. This is a newly created leadership role offering a unique opportunity to build and scale a world-class global SRE team while driving the organization's evolution toward modern, cloud-based infrastructure solutions.

The Role

As SRE Manager, you will lead the strategic expansion of site reliability practices across a global organization, transforming operational workflows into proactive, automation-driven processes. You'll build and mentor a high-performing team responsible for ensuring the reliability, scalability, and performance of both cloud and on-premise infrastructure and services.

Key Responsibilities

Team Leadership & Development

Lead the growth and development of a global SRE team from a small, high-performing unit to a comprehensive function
Oversee recruitment, onboarding, and professional development initiatives
Design and deliver tailored training programs on SRE principles, cloud operations, and automation tools
Foster a culture of excellence, collaboration, and continuous improvement

Strategic Transformation

Evaluate current operational workflows, RACIs, and skill assessments across the global team
Execute a comprehensive roadmap to transition reactive operations into proactive, SRE-aligned processes
Identify and eliminate toil through automation and process optimization
Define and implement sustainable automation frameworks to reduce operational risk

Technical Excellence

Collaborate with architects, platform engineering, ServiceNow developers, and application teams to design and implement comprehensive observability frameworks
Define, monitor, and regularly review SLIs, SLOs, SLAs, and error budgets
Enhance proactive incident detection capabilities and reduce MTTR
Oversee incident response processes and champion blameless post-mortem culture across teams

Stakeholder Management

Build strong partnerships with internal and external stakeholders
Prepare and present operational performance reports to leadership
Drive alignment between SRE and application development teams

Required Qualifications

Leadership Experience

Proven track record building and leading operational and engineering teams
Demonstrated ability to foster collaboration between SRE and development teams
Experience driving operational excellence and reducing downtime while accelerating delivery cycles

SRE & Incident Management Expertise

Strong experience defining and monitoring SRE principles (SLIs, SLOs, SLAs, error budgets)
Skilled in incident response, post-incident analysis, and facilitating blameless post-mortems
Track record of implementing proactive measures to prevent recurring incidents

Technical Proficiency

Deep expertise in Azure technologies (experience with other cloud providers highly beneficial)
Proven experience with Infrastructure as Code tools, particularly Terraform
Hands-on experience with monitoring and observability tools such as Logic Monitor, Azure Monitor, Prometheus, Grafana, Dynatrace, and Splunk
Strong scripting or programming skills (Python, PowerShell)
Experience with ServiceNow and Azure DevOps
Understanding of container orchestration platforms
Solid grasp of Agile, ITIL, and ITSM frameworks

Preferred Qualifications

Experience managing other managers
SharePoint administration experience
Demonstrated success spearheading automation initiatives that significantly reduced infrastructure provisioning time

Soft Skills

Excellent communication and presentation abilities
Strong stakeholder management capabilities
Strategic thinking with hands-on execution ability

Why This Role?

This is a rare opportunity to shape the SRE function of a global organization from the ground up. You'll have the autonomy to build processes, develop talent, and drive meaningful transformation that directly impacts business outcomes. If you're passionate about reliability engineering, team development, and driving operational excellence at scale, this role offers the platform to make a significant impact.

Salary : $155,000 - $180,000

Apply for this job

Receive alerts for other Site Reliability Engineering Manager job openings

Site Reliability Engineering Manager

What are the responsibilities and job description for the Site Reliability Engineering Manager position at TBG | The Bachrach Group?

What is the career path for a Site Reliability Engineering Manager?

Job openings at TBG | The Bachrach Group

Not the job you're looking for? Here are some other Site Reliability Engineering Manager jobs in the Charlotte, NC area that may be a better fit.

We don't have any other Site Reliability Engineering Manager jobs in the Charlotte, NC area right now.

AI Assistant is available now!