What are the responsibilities and job description for the SRE Program Manager position at TechnoSphere, Inc.?
Job Title: SRE Program Manager
Location: Miami, FL (Hybrid Model) (Local Candidates are preferred)
H4EAD, H1B, Green Card and US Citizen are Eligible
Exp - 5Yrs
We are looking for a highly experienced SRE Program Manager with a solid software development background. The ideal candidate will have 5 years of total professional experience and at least 5 years in program management.
This role requires strong communication skills, proven ability to lead and manage multiple teams, and deliver high-impact programs that enhance reliability, scalability, and operational excellence.
The ideal candidate will combine program management skills with hands-on SRE knowledge to ensure high availability, performance, and reliability of our services while driving operational excellence.
Key Responsibilities:
Program Management & Leadership
Lead and manage complex operational support programs, ensuring alignment with business objectives and SLAs.
Coordinate cross-functional teams (engineering, operations, support) to deliver seamless production support.
Drive program planning, execution, risk management, and reporting.
Operations & Reliability
Establish and monitor SLOs, SLIs, and SLAs to maintain service reliability and uptime.
Implement incident management, root cause analysis, and post-mortem reviews to drive continuous improvement.
Define and track key operational metrics; ensure compliance with reliability and performance goals.
Drive operational efficiency through process optimization, change management, and strategic planning
Incident Management - Own major incident response and postmortem processes; ensure root cause analysis and long-term resolutions.
Platform Operations - Oversee cloud infrastructure, CI/CD pipelines, observability, tooling, and configuration management.
SRE Practices
Apply SRE principles to optimize system reliability, scalability, and efficiency.
Automate operational tasks and processes (CI/CD, monitoring, alerting, runbooks, etc.).
Partner with engineering teams to embed reliability into system design and deployment.
Next-Gen Capabilities - Bring in next-gen capabilities around AI, GenAI, and Agentic AI to enhance and improve SRE KPIs.
Stakeholder Management:
Act as the primary liaison between business units and technical teams for all operations-related initiatives.
Provide transparent communication on program progress, incidents, risks, and resolutions to senior leadership.
Build strong relationships with internal and external stakeholders to drive collaboration and accountability.
Strategic Leadership: Define and drive the SRE and Infrastructure strategy with business and IT stakeholders.
Cross-Functional Initiatives: Drive cross-functional strategic initiatives and run programs from ideation through execution.
Collaboration: Collaborate with other teams to align program deliverables and success metrics.
Platform Operations: Oversee cloud infrastructure, CI/CD pipelines, observability, tooling, and configuration management.
Security and Compliance: Collaborate with InfoSec teams and identify efficiency opportunities across systems and services.
Reliability and Performance: Lead initiatives to improve the availability, latency, and efficiency of services.
Qualifications:
Minimum 1 years of relevant experience in SRE, DevOps, or Infrastructure Engineering.
Familiarity with SRE principles and key SRE currencies defined by Google or a similar framework.
Minimum 5 years of leadership experience managing technical teams in a 24x7 model.
Deep expertise in cloud computing, container orchestration, automation, and observability.
Strong understanding of modern software delivery practices (CI/CD and GitOps).
Proven track record in incident response, system architecture, and operational excellence