What are the responsibilities and job description for the Incident Management Lead position at Quadrant Technologies?
Incident Management Lead
Location : Deerfield IL, (Onsite)
Full Time Only
Must Have Technical/Functional Skills
- 6 years of IT Service Management experience with a minimum of 3 years in a dedicated Major Incident
- Management or Incident Commander role in a large enterprise (Fortune 500 / FTSE 100 equivalent complexity).
- ITIL 4 Managing Professional or ITIL 4 Specialist: High Velocity IT certification
- (ITIL 4 Foundation minimum required).
- Demonstrable experience managing Azure platform incidents: working knowledge of Azure Monitor,
- Azure Service Health, Log Analytics, Application Insights, and Microsoft support escalation paths.
- Proven ability to command high-pressure P1 incidents involving 20 stakeholders across technical and
- executive levels simultaneously
- Expert-level proficiency in ServiceNow ITSM, including Incident, Problem, Change modules and
- dashboard/report building.
- Strong data analysis skills: ability to analyze incident trends, build KPI dashboards, and present
- actionable insights to senior leadership.
Roles & Responsibilities
Major Incident Command & Coordination
- Serve as the single accountable owner for all P1 and P2 major incidents across on premises and Azure-hosted services, from initial declaration through resolution and post-incident closure.
- Convene and chair live incident bridge calls and virtual war rooms using Microsoft Teams, coordinating across 10 internal technical resolver groups, managed service partners, and Microsoft Azure Support (Unified Support escalations).
- Drive swift triage by leveraging Azure Service Health, Resource Health, and Azure Monitor dashboards to rapidly establish scope, affected services, and blast radius within the first 15 minutes of an incident.
- Make and enforce escalation decisions, including engaging Microsoft CSS P1 Severity A support cases and activating DR runbooks where service restoration via normal means is not achievable within RTO.
- Maintain clear, timely, and audience-appropriate stakeholder communications throughout the incident lifecycle, including CEO/CISO executive briefings for business-critical outages.
Post-Incident Review & Continual Improvement
- Facilitate structured blameless Post-Incident Reviews (PIRs) within agreed SLAs (P1: 48 hours. P2: 5 business days); produce high-quality PIR reports consumed by CTO and Board Technology Committee.
- Own the incident action item registry; chair weekly SIP (Service Improvement Plan) reviews to ensure commitments are delivered on time and to quality.
- Identify systemic incident patterns through trend analysis using ServiceNow and Log Analytics. collaborate with Problem Management to drive root cause elimination for repeat incidents.
- Define, track, and report on enterprise incident management KPIs: MTTD, MTTR, incident recurrence rate , SLA compli ance, and customer impact hours presented to IT leadership in monthly operational reviews.
Process Ownership & ITSM Governance
- Own, maintain, and continuously improve the enterprise Major Incident Management process, policy, playbooks, and runbooks aligned to ITIL 4 and the organization s IT Risk and Control Framework.
- Define and govern the incident severity classification matrix and escalation decision tree. ensure consistent adoption across all IT towers and managed service partners.
- Maintain and test the enterprise crisis communication framework, including stakeholder notification trees, bridge protocols, and executive communication templates.
- Collaborate with Change Management to ensure CAB processes adequately assess change- induced incident risk; maintain correlation tracking between changes and incidents.
Azure Operations & Cloud Incident Specifics
- Develop and maintain Azure-specific incident playbooks covering platform scenarios: AKS node/pod failures, Azure SQL failover events, ExpressRoute circuit drops, Azure Active Directory (Entra ID) authentication outages, and Azure region-wide service incidents.
- Maintain working relationships with Microsoft TAM (Technical Account Manager) and Azure Rapid Response team: ensure escalation paths to Microsoft CSS are exercised and SLAs understood.
- Monitor Azure Service Health and Microsoft 365 Service Health Dashboard proactively. initiate pre-emptive incident declarations for advisory/degraded-service notifications affecting business-critical services.
- Participate in Azure Operational Reviews with Cloud Platform and SRE teams to identify observability gaps, alerting blind spots, and runbook deficiencies before they manifest as major incidents.
Capability Building & Stakeholder Engagement
- Design and deliver MIM process training programmes for Level 1/2 Service Desk, resolver groups, and technology leadership; conduct quarterly simulation exercises (GameDay / IncidentEx).
Salman Shaikh
Quadrant Technologies 1- (Cell)
Salary : $100,000 - $130,000