What are the responsibilities and job description for the Enterprise Incident & Reliability Manager position at Centraprise?
Enterprise Incident & Reliability Manager
Mount Laurel, NJ or West Chester PA (5 days on-site)
Fulltime
Job Description:
Incident Manager
Must Have Technical/Functional Skills:
- Incident Management,
- SRE and operations engineering
- Reliability architecture
- Automation and observability
- Executive communication
Roles & Responsibilities:
- Incident Manager - Resources to provide technical leadership for enterprise wide, high severity incidents, problem investigations, and high risk changes, while shaping reliability strategy, governance, and operational standards across complex, distributed platforms.
- Drive Incident resolution management by directing cross functional teams through high impact outages, systemic problem resolution, and large scale change events.
- Creating scripts in ELK, Grafana, AppDynamics, COP
- Auto-executing predefined queries in ELK, Grafana, AppDynamics, COP for real-time issues
- Attaching live query outputs (metrics, logs, traces) directly to alerts/incidents
- Eliminating manual tool navigation for IM and Alert teams
- Enhancing alert systems with contextual intelligence, including metric deviations and anomaly trends, relevant log snippets and patterns, and identifying affected CIs and downstream impacts
Education:
- Minimum Graduation