What are the responsibilities and job description for the Incident Manager position at Tekgence Inc?
Job Title: Incident Manager
Job Location: Philadelphia, PA- (onsite)
Job Duration: Long term
Experience: 5 Years
Job Description:-
- Experience with incident management processes for high-impact outages, ensuring rapid response, effective coordination, and timely resolution.
- Act as Incident Commander during critical incidents, driving cross-functional collaboration and decision-making under pressure.
- Apply Site Reliability Engineering (SRE) principles to enhance system reliability, scalability, and operational efficiency.
- Design and implement reliability architecture to minimize downtime and improve system resilience.
- Drive automation initiatives to streamline operational workflows, reduce manual effort, and improve response times.
- Establish and enhance observability practices including monitoring, logging, and alerting for proactive issue detection.
- Conduct root cause analysis (RCA) and lead systemic problem resolution to prevent recurrence of incidents.
- Manage large-scale change events with a focus on risk mitigation, stability, and minimal service disruption.
- Communicate effectively with executive stakeholders, providing clear updates, impact assessments, and resolution plans.
- Collaborate with engineering, operations, and business teams to continuously improve incident response and operational maturity.
Email : ajitkumar.rai@tekgence.com