Demo

Major Incident Manager

Yochana
Wilmington, DE Full Time
POSTED ON 4/15/2026
AVAILABLE BEFORE 5/14/2026

Job Title: Major Incident Management (MIM) & NOC Lead (10 Years)

Location: Onsite – Wilmington, DE (Day1 Onsite)

Employment Type: Full-time

Experience: 10 years in IT Operations / NOC / Major Incident Management, including


Role Summary:

The Major Incident Management & NOC Lead is responsible for end-to-end command and control of the enterprise’s 24x7 operational monitoring and incident response. This role leads the MIM and NOC function, drives Major Incident (P1/P2) execution, ensures rapid service restoration, and continuously improves operational maturity through problem management, automation, observability enhancements, and SLA governance.

This role requires a mix of strong incident leadership, technical depth across infrastructure and applications, and people/process management to ensure stability, availability, and performance across critical services.

Key Responsibilities:

A) Major Incident Management (Command & Control)

  • Own the Major Incident (P1/P2) process from detection to resolution, including war-room leadership, stakeholder updates, and closure.
  • Act as the Incident Commander and ensure structured triage, containment, workaround, and restoration.
  • Drive cross-functional coordination (App, Infra, Network, Security, DB, Cloud, Vendor teams) to reduce MTTR.
  • Ensure high-quality incident communications: executive summaries, impact analysis, ETAs, customer/business comms.
  • Lead and facilitate Post Incident Reviews (PIR/RCA); ensure actionable corrective/preventive actions (CAPA).
  • Identify recurring issues and trigger Problem Management with measurable reduction plans.

B) NOC Leadership & Operations

  • Lead the NOC team responsible for 24x7 monitoring, alert triage, event correlation, escalation, and ticket quality.
  • Establish/maintain standard operating procedures (SOPs), runbooks, escalation matrices, and on-call models.
  • Ensure NOC meets SLAs/OLAs, improves alert fidelity, and reduces noise through tuning and automation.
  • Manage handover governance between shifts; maintain service continuity and operational hygiene.

C) Service Reliability & Continuous Improvement

  • Drive operational improvements: monitoring coverage, SLO/SLA alignment, incident prevention, and resiliency initiatives.
  • Partner with Engineering/Platform teams on observability strategy, proactive detection, and reliability patterns.
  • Track and report operational metrics: MTTD, MTTR, incident volume, re-open rate, SLA compliance, and trends.
  • Support readiness for audits and compliance: evidence collection, process adherence, and risk mitigation.

D) Stakeholder & Vendor Management

  • Interface with business stakeholders, service owners, and leadership to provide incident status, risk, and remediation plans.
  • Manage vendor escalations and ensure timely resolution aligned to contractual SLAs.

E) Managerial / Leadership Skills (Must Have)

  • Proven experience leading MIM & NOC Operations teams (shift-based or on-call models).
  • Strong Incident Commander capability: calm under pressure, structured decision-making, priority trade-offs.
  • Excellent stakeholder management across technical teams and business leadership.
  • Ability to build and enforce process discipline (ITIL-aligned), while improving speed and quality.
  • Strong coaching/mentoring: performance management, skill development, hiring support as needed.
  • Effective communication: concise executive updates, clear action plans, facilitation of PIR/RCA sessions.
  • Data-driven mindset: uses metrics and trend analysis to drive operational outcomes.

Technical Skills (Must Have):

A) Monitoring / Observability

  • Hands-on experience with NOC tooling and observability platforms such as:
  • Splunk / ELK, Datadog, Dynatrace, New Relic, AppDynamics
  • Prometheus/Grafana, CloudWatch/Azure Monitor
  • Strong understanding of event correlation, alert tuning, noise reduction, and dashboarding.

B) Incident / ITSM Platforms

  • Strong working knowledge of ServiceNow (Incident, Problem, Change, Knowledge, CMDB) or equivalent ITSM tools.
  • Experience designing workflows, SLAs/OLAs, routing rules, and automation integrations.

C) Infrastructure & Platform Breadth

  • Solid understanding across:
  • Windows/Linux administration basics
  • Network fundamentals (DNS, DHCP, TCP/IP, routing, load balancers, firewalls)
  • Compute/virtualization (VMware/Hyper-V) and storage concepts
  • Databases fundamentals (SQL/Oracle, replication, performance symptoms)
  • Cloud fundamentals and operational support for AWS/Azure/GCP:
  • IAM basics, networking (VPC/VNet), scaling, logging/monitoring, common failure patterns.

D) Automation & Scripting (Good to Have / Preferred)

  • Scripting knowledge: PowerShell / Python / Bash
  • Familiarity with automation tools: Ansible, Terraform, CI/CD operational workflows.
  • Ability to create/maintain runbook automation and self-healing patterns.

E) Security & Resilience (Preferred)

  • Awareness of security operations touchpoints: DDoS symptoms, certificate expiries, IAM issues, endpoint/EDR alerts.
  • Familiarity with BCP/DR processes, failover testing, and resilience design collaboration.

F) ITIL / Process Expectations

  • Strong ITIL understanding across Incident, Problem, Change, Knowledge, and Service Level Management.
  • Ability to implement governance around:
  • Change risk assessment, change windows, incident-change correlation
  • RCA quality, action item tracking, and effectiveness validation

Qualifications:

  • Bachelor’s degree in computer science / IT / Engineering or equivalent experience.
  • ITIL v4 Foundation (preferred).
  • Cloud certifications (preferred): AWS/Azure fundamentals or associate level.
  • Experience in enterprise production environments with stringent availability requirements.

Success Metrics / KPIs

  • Reduced MTTD and MTTR for P1/P2 incidents.
  • Improved SLA compliance and reduction in escalation breaches.
  • Reduced repeat incidents via problem management and preventive actions.
  • Improved alert quality: lower false positives, better signal-to-noise ratio.
  • Strong PIR/RCA compliance: on-time RCAs with measurable preventive outcomes.
  • Improved NOC operational maturity: SOP adherence, shift handover quality, audit readiness.

Nice-to-Have Industry Contexts

  • Transportation / financial services / healthcare / e-commerce / SaaS environments with high availability targets.
  • Experience supporting microservices, Kubernetes, and distributed systems.

Salary : $110,000 - $120,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Yochana

  • Yochana Edison, NJ
  • please find the Position Java Developer/senior/Lead-Charlotte, NC and Edison, NJ, San Leandro, CA, Phoenix, AZ JD 6 years application development experienc... more
  • 9 Days Ago

  • Yochana Milpitas, CA
  • Embedded C/C Engineer (DSP or image processing algorithms) Onsite in Milpitas, CA. Contract Job Summary We are seeking a highly skilled Embedded C/C Engine... more
  • 9 Days Ago

  • Yochana Redmond, WA
  • We’re looking for a Windows Kernel Driver Engineer to build and debug high‑quality kernel‑mode and user‑mode drivers and system services at the hardware / ... more
  • Just Posted

  • Yochana Jersey, NJ
  • Job Title : SAS Platform Migration Engineer Location : Jersey City, NJ (Onsite) Duration : Long term contract role Job Description: Setting up SAS 9.4 - Ex... more
  • Just Posted


Not the job you're looking for? Here are some other Major Incident Manager jobs in the Wilmington, DE area that may be a better fit.

  • Avance Consulting Wilmington, DE
  • Experience: 10 years in IT Operations / NOC / Major Incident Management, including leadership ownership. Role Summary: The Major Incident Management & NOC ... more
  • 2 Days Ago

  • Kodeva LLC Wilmington, DE
  • Job Title: Major Incident Management (MIM) & NOC Lead (10 Years) Location: Onsite – Wilmington, DE (Day1 Onsite) Employment Type: Full-time Experience: 10 ... more
  • 4 Days Ago

AI Assistant is available now!

Feel free to start your new journey!