What are the responsibilities and job description for the Incident Manager (SRE / Operations) position at RealTek Consulting?
Job Title: Incident Manager (SRE / Operations)
Location: Philadelphia, PA (100% Onsite – Day 1)
Duration: 12 Months
Open Positions: 14
⚠️ Critical Notes:
- 100% Onsite from Day 1 (Philadelphia, PA)
- Immediate hiring – bulk positions (14 openings)
- Virtual interview drive scheduled soon – fast turnaround required
Job Summary:
We are seeking experienced Incident Managers with strong expertise in SRE, operations engineering, and incident command. The ideal candidate will lead high-impact incident response, ensure system reliability, and drive cross-functional coordination during outages and large-scale system events.
Key Responsibilities:
- Lead incident command and management for critical production issues
- Coordinate cross-functional teams during high-severity incidents
- Drive root cause analysis (RCA) and implement preventive measures
- Manage system reliability and operational stability
- Collaborate with SRE, DevOps, and engineering teams
- Ensure effective communication with stakeholders and leadership
- Drive automation and observability improvements
- Handle large-scale change events and system outages
- Maintain incident reports, documentation, and post-mortem analysis
- Continuously improve incident response processes and frameworks
Required Skills & Experience:
- 6–8 years of experience in:
- Incident Management / Production Support / SRE roles
- Strong expertise in:
- Incident Command & Crisis Management
- Site Reliability Engineering (SRE)
- Operations Engineering
- Strong knowledge of:
- Reliability architecture and system design
- Automation and observability tools
- Proven ability to:
- Lead teams during high-impact outages
- Drive systemic problem resolution
- Excellent executive communication and stakeholder management skills
Technical Skills:
- Incident Management
- SRE / Operations Engineering
- Monitoring & Observability Tools
- Automation & Reliability Engineering
Preferred Qualifications:
- Experience in enterprise-scale production environments
- Strong analytical and problem-solving skills
- Ability to work in high-pressure, fast-paced environments
Key Deliverables:
- Rapid and effective incident resolution
- Improved system reliability and uptime
- Well-documented RCA and post-incident reports
- Strong coordination across technical and business teams