Demo

Senior Manager, Site Reliability Engineering

Catalyst Brands
Dallas, TX Full Time
POSTED ON 6/27/2026
AVAILABLE BEFORE 4/21/2027

Overview

Senior Manager, Site Reliability Engineering

The Site Reliability Engineering Manager is responsible for overseeing the daily operations and delivery of the Site Reliability Engineering teams. This role plays a key part in driving team productivity and ensuring the ongoing health, performance, resilience, and stability of Catalyst’s eCommerce and CRM platforms.In addition to managing operational aspects, the SRE Sr.Manager actively contributes to the technical direction of the team. This includes shaping the automation strategy, guiding telemetry and observability practices, leading solution delivery, and managing incidents and problems affecting platform reliability.This is a hybrid leadership role that combines technical expertise with people management. The SRE Manager also contributes to both short and long-term planning initiatives—spanning systems architecture, team development, and organizational strategy. What You Will Do: Team Leadership & Project Management•    Provide both technical and people leadership to Site Reliability Engineering (SRE) teams through regular one-on-one meetings, team syncs, and performance reviews.•    Manage project execution by organizing cross-functional teams, assigning responsibilities, and tracking progress against defined schedules and milestones.•    Assist in budgeting, workforce planning, hiring, and third-party contract negotiations to support team growth and operational goals.________________________________________Platform Reliability & Operational Excellence•    Drive continuous improvements in platform reliability, stability, and performance by overseeing the deployment of fully automated telemetry, observability, and AI-driven monitoring solutions.•    Lead the development and enhancement of intelligent alerting and automated incident response systems to improve service restoration speed and issue detection.•    Collaborate with administrators and platform engineers on implementation decisions to ensure highly reliable infrastructure, systems, and integrations.•    Document all changes in accordance with change control policies and documentation standards; identify risks and recommend corrective actions when necessary.________________________________________Incident & Problem Management•    Provide advanced Incident Management and Problem Management support by analyzing telemetry data and system logs to identify, remediate, and prevent reliability issues.•    Participate in on-call escalation support rotations in alignment with the 24/7/365 support model.•    Act as the Escalation Manager/Critical Incident Manager during major incidents, guiding teams through structured and effective service recovery.•    Communicate timely updates and incident reports to senior leadership during and after critical events.________________________________________Stakeholder Collaboration & Support•    Lead conversations and provide business and engineering support for both internal stakeholders and external customers. What You Will Need: Experience & Leadership•    10 years of experience in global organizations, with a proven ability to communicate effectively across all levels—from executives to individual contributors.•    5 years of hands-on Site Reliability Engineering (SRE) experience, including platform automation, telemetry, observability, and self-healing systems.•    Demonstrated leadership and collaboration in high-availability, mission-critical digital environments.•    Should have strong support knowledge and understanding on retail ecommerce flow - Web and Mobile technologies•    Work with software engineers across scrum teams and performance engineering to ensure systems are meeting reliability and performance standards. •    Hands-on experience with debugging, optimizing code and automation. •    Identify opportunities to adopt innovative technologies and continuous improvement – Automation, Shift left, Self-Heal.________________________________________Platform & Application Support•    Extensive experience supporting and administering digital retail and eCommerce platforms with one of the Cloud providers (AWS/Azure/Google Cloud).•    Demonstrated experience in application design, software development, testing and production support of Java-J2EE based eCommerce applications. •    Practical experience monitoring and maintaining streaming platform technologies such as Apache Kafka.•    Deep understanding of cloud-native architectures and platform operations.________________________________________Monitoring, Telemetry & Observability•    Proficient with modern monitoring, logging, and telemetry tools including:o    New Relic, Splunk, ELK, Datadog, DynaTrace, Catchpoint, and AWS CloudWatch•    Hands-on experience designing and implementing automated health checks, observability pipelines, and self-healing solutions.________________________________________Automation & Infrastructure as Code (IaC)•    Strong experience with automation tools and frameworks, such as:o    Jenkins, Chef, Ansible, Terraform.•    Expertise in scripting languages used for platform automation and diagnostics:o    PowerShell, Python, Ruby, AWK, SED, etc.________________________________________Cloud, Networking & Systems Knowledge•    Advanced experience with public cloud platforms:o    Microsoft Azure and Amazon Web Services (AWS).•    Solid understanding of networking fundamentals:o    TCP/IP, DNS, DHCP, WINS.•    Advance experience with Content Delivery Networks (CDNs) such as Akamai and Cloudflare.________________________________________Tooling & Operational Practices•    Experience using ITSM and collaboration platforms:o    Jira, BMC Remedy, ServiceNow.•    Strong understanding of IT operations frameworks (e.g., ITIL, MOF).________________________________________Education & Certifications•    Bachelor’s degree in computer science or related technical field.•    Relevant technical certifications are a plus, including:o    Azure/AWS, Microsoft and ITIL.

Pay Range

USD $103,500.00 - USD $172,500.00 /Yr.

Salary : $103,500 - $172,500

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Senior Manager, Site Reliability Engineering?

Sign up to receive alerts about other jobs on the Senior Manager, Site Reliability Engineering career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$156,679 - $196,968
Income Estimation: 
$222,941 - $284,552
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Catalyst Brands

  • Catalyst Brands York, NY
  • Overview The Associate Graphic Designer supports the development of fresh, trend-relevant graphics for men’s and women’s apparel across Catalyst Brands’ li... more
  • 1 Day Ago

  • Catalyst Brands York, NY
  • Overview As a member of the Sales team, the Sales Assistant will fully support the team with an emphasis on reporting and order management. You have an ent... more
  • 2 Days Ago

  • Catalyst Brands York, NY
  • Overview Technical Designer - Catalyst Brands The Technical Designe r provides technical expertise to evaluate apparel merchandise in terms of style, measu... more
  • 3 Days Ago

  • Catalyst Brands Plano, TX
  • Overview We are seeking a Procurement Sourcing Manager – Technology to support the strategic sourcing of Software-as-a-Service (SaaS) and enterprise softwa... more
  • 3 Days Ago


Not the job you're looking for? Here are some other Senior Manager, Site Reliability Engineering jobs in the Dallas, TX area that may be a better fit.

  • Akraya, Inc. Plano, TX
  • Primary Skills: SRE Expertise (expert), Observability (expert), Incident Leadership (advanced), Java (advanced), Cloud Platforms (advanced) Contract Type: ... more
  • 5 Days Ago

  • Forhyre Plano, TX
  • Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas of development and are interested in continuing to imp... more
  • 1 Month Ago

AI Assistant is available now!

Feel free to start your new journey!