Demo

Director, Production Services Manager

BNY External Career Site
York, NY Full Time
POSTED ON 5/23/2026
AVAILABLE BEFORE 7/23/2026

Head of Production Services Governance, Incident & Problem Management

Role Summary

The Head of Production Services Governance, Incident & Problem Management is accountable for the enterprise governance, standards, and performance of Technology Incident Management and Problem Management (including root cause analysis) across BNY’s Platforms. This leader oversees a team that sets the operating model, drives consistent execution, improves quality and speed of restoration, and strengthens auditability and regulatory credibility.

The role is the senior point of accountability for:

  • Firm-wide incident/problem governance and ITIL-aligned standards
  • High-severity incident command and communications frameworks
  • End-to-end RCA quality and timeliness, including corrective/preventive actions
  • Regulatory and client-facing incident narratives and responses
  • Internal oversight engagement with groups such as ORR and ERO
  • Automation and AI augmentation to modernize and scale incident/problem practices

This position partners closely with engineering, SRE/operations, cyber, resiliency, risk, compliance, and business stakeholders to ensure stability, transparency, and continuous improvement of production services.

 

Key Objectives

  1. Protect service availability and client experience by ensuring rapid restoration and disciplined incident handling.
  2. Improve resiliency and reduce repeat incidents through high-quality problem management, robust RCAs, and effective remediation governance.
  3. Strengthen governance and audit defensibility by ensuring consistent process adherence, evidence capture, and clear accountability.
  4. Modernize production governance through automation, AIOps capabilities, and AI-assisted workflows.
  5. Elevate operational excellence through measurable improvements in MTTR, recurrence, SLA adherence, and control effectiveness.
 

Primary Responsibilities

1) Enterprise Incident Management Governance (ITIL)

  • Own the Incident Management practice and ensure it is implemented consistently across Platform Production Services and aligned to ITIL principles.
  • Establish and maintain incident taxonomy, severity models, prioritization rules, escalation paths, and functional/organizational RACI.
  • Define Major Incident Management (MIM) framework: incident command roles, war-room orchestration, communications cadence, stakeholder engagement, and decision rights.
  • Ensure end-to-end controls: accurate incident logging, categorization, impact assessment, timeline reconstruction, evidence retention, and closure criteria.
  • Drive performance through standard KPIs (e.g., MTTA/MTTR, reopen rate, SLA compliance, major incident frequency, customer-impact minutes, incident backlog health).

2) Enterprise Problem Management & RCA Excellence (ITIL)

  • Own the Problem Management practice including proactive problem identification, trending, and prevention of recurrence.
  • Establish RCA standards (methodologies such as 5 Whys, fishbone, fault tree, “cause–trigger–control gap” framing) and ensure consistent quality across teams.
  • Govern Corrective and Preventive Action (CAPA) management: remediation backlog, prioritization, due dates, owner accountability, and validation of effectiveness.
  • Maintain governance for Known Errors and Workarounds, enabling faster recovery and better knowledge reuse.
  • Drive systemic improvements by connecting incidents/problems to resiliency risks, architectural weaknesses, control gaps, and engineering quality.

3) Regulatory, Client, and Executive Communications & Responses

  • Serve as accountable executive for regulatory responses and supervisory requests relating to incidents, outages, recovery actions, RCA findings, and resiliency improvements.
  • Lead firm readiness for time-sensitive regulatory deliverables—ensuring accuracy, consistency, and defensible evidence.
  • Coordinate and quality-assure client communications for impactful incidents (internal/external statements, timelines, cause, remediation, and prevention).
  • Provide clear executive narratives and materials for senior leadership, risk committees, audit committees, and business stakeholders.

 

 

4) Oversight & Partnership Model (ORR, ERO, Risk, Audit, Compliance)

  • Act as the primary interface to internal oversight groups (e.g., ORR, ERO, Operational Risk, Compliance, Internal Audit, and Technology Risk Management).
  • Ensure incidents/problems are appropriately mapped to relevant governance constructs (e.g., operational risk events where applicable) with clear traceability.
  • Lead continuous improvement of control coverage and evidence quality to support audits and examinations.
  • Partner with Resiliency teams to connect operational learning to scenario testing, dependency mapping, recovery planning, and service resiliency metrics.

5) Standardization, Quality Assurance, and Continuous Improvement

  • Build and run a Quality Management System for incident/problem practices: sampling, assurance reviews, coaching, playbooks, and maturity assessments.
  • Develop and maintain standard artifacts (runbooks, major incident playbooks, comms templates, RCA templates, PIR guidance).
  • Run Continual Improvement programs: trend analysis, “top drivers” remediation themes, performance benchmarking, and maturity roadmaps.
  • Drive adoption of consistent tooling, workflows, and data standards across platforms.

6) Automation & AI Enablement (AIOps / Intelligent Operations)

This role is expected to use AI responsibly to improve speed, quality, and scale of incident/problem management while meeting security, privacy, and model-risk expectations.

Key AI and automation outcomes include:

  • AI-assisted triage: classification, routing, deduplication, and severity recommendation based on history and signals.
  • Correlation and probable cause insights using telemetry, topology, and change data to identify likely blast radius and suspects.
  • Automation for repetitive tasks: stakeholder updates, timeline capture, evidence packaging, and post-incident documentation generation.
  • RCA acceleration: AI-supported timeline reconstruction, log summarization, anomaly explanation, and “similar incident” retrieval.
  • Knowledge management uplift: automated drafting of knowledge articles/workarounds; improvement suggestions based on recurrence patterns.
  • Establish governance for AI usage: model transparency, human-in-the-loop controls, data handling, audit logs, and bias/quality monitoring.

 

7) Leadership & Talent Development

  • Lead and develop a high-performing team of incident/problem governance professionals (e.g., problem managers, automation analysts).
  • Establish role clarity, training paths, and ITIL-aligned capability development.
  • Foster a culture of calm, disciplined execution during crises and a learning culture post-incident—focused on prevention, not blame.
 

Scope & Decision Rights

  • Enterprise-level authority to define and enforce incident/problem standards and minimum controls.
  • Authority to convene major incident response, direct escalations, and require timely executive updates.
  • Authority to gate incident/problem closure based on quality criteria (documentation, evidence, RCA completeness, CAPA commitments).
  • Joint governance with engineering/production leaders to prioritize remediation work and measure effectiveness.
 

Key Interfaces

  • Platform Production Services leaders, SRE/Operations, Engineering, Architecture
  • Cybersecurity Operations, Fraud/Financial Crime Technology (as relevant)
  • Enterprise Resiliency Office (ERO)
  • Office of Regulatory Relations (ORR)
  • Operational Risk, Compliance, Legal, Privacy
  • Internal Audit, Technology Risk Management
  • Business/Product leadership and client coverage teams
 

Required Qualifications

  • 10–15 years in technology operations, SRE/production services, service management, or resiliency roles in complex enterprises; regulated financial services strongly preferred.
  • Demonstrated leadership in Major Incident Management and Problem Management/RCA at enterprise scale.
  • Strong command of ITIL practices (Incident, Problem, Monitoring & Event, Service Level, Change Enablement, Continual Improvement; familiarity with CMDB/Service Configuration is a plus).
  • Proven experience driving process standardization, operating model change, and measurable performance improvements (e.g., MTTR reduction, recurrence reduction).
  • Experience leading regulatory/audit-facing responses with strong evidence discipline and executive communication.
 

Preferred Qualifications / Certifications

  • ITIL 4 Managing Professional (MP) and/or ITIL Strategic Leader (SL); ITIL Foundation minimum.
  • Familiarity with ISO/IEC 20000, NIST, and resiliency/operational risk expectations in financial services (helpful but not required).
  • Experience with AIOps platforms/observability tooling (e.g., event correlation, log analytics, tracing, anomaly detection).
  • Experience with Agile/DevOps/SRE operating models and integrating incident/problem practices into product/platform delivery.
 


 

 

Core Competencies (What “Great” Looks Like)

  • Crisis leadership: calm command presence, structured decision-making, clear communications under pressure.
  • Governance rigor: sets standards that are pragmatic, scalable, and audit-defensible.
  • Analytical excellence: uses trends and data to drive prevention, not just restoration.
  • Influence without friction: partners effectively with engineering leaders to get remediation done.
  • Automation mindset: removes manual steps, improves quality through workflow and tooling.
  • AI fluency with controls: leverages AI safely with strong human oversight and evidence trails.
 

Success Metrics (Illustrative)

  • Reduced major incident frequency and customer-impact minutes (YoY).
  • Improved MTTR/MTTA and decreased escalations due to better routing/triage.
  • Increased RCA timeliness and quality scores, fewer incomplete RCAs, higher CAPA completion on time.
  • Reduced repeat incidents driven by top recurring causes.
  • Improved audit/regulatory outcomes: fewer findings, faster response cycles, higher evidence quality.
  • Increased automation coverage: % of incidents with AI-assisted classification/correlation; reduction in manual documentation hours.

Salary.com Estimation for Director, Production Services Manager in York, NY
$135,649 to $173,235
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Director, Production Services Manager?

Sign up to receive alerts about other jobs on the Director, Production Services Manager career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$119,912 - $160,327
Income Estimation: 
$155,371 - $216,992
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at BNY External Career Site

  • BNY External Career Site York, NY
  • The Bank of New York Mellon seeks Vice President, Full-Stack Engineer II in New York, NY, to consult with internal business groups to provide appropriate a... more
  • Just Posted

  • BNY External Career Site Los Angeles, CA
  • Job Description The Bank of New York Mellon seeks Vice President, Model Risk Management II in Los Angeles, CA, to contribute to highly visible enterprise-w... more
  • Just Posted

  • BNY External Career Site Salt Lake, UT
  • At BNY, our culture allows us to run our company better and enables employees’ growth and success. As a leading global financial services company at the he... more
  • 1 Day Ago

  • BNY External Career Site Jersey, NJ
  • Job Description Technology Services Group Inc. seeks SVP, Production Services Infrastructure Support in Jersey City, NJ, to design, implement, integrate, a... more
  • 1 Day Ago


Not the job you're looking for? Here are some other Director, Production Services Manager jobs in the York, NY area that may be a better fit.

  • BNY York, NY
  • Job Description Head of Production Services Governance, Incident & Problem Management Role Summary The Head of Production Services Governance, Incident & P... more
  • 14 Days Ago

  • WPP Production York, NY
  • About WPP Production We are WPP Production - the unified global production company that brings together all of WPP's content producers, studios and craft e... more
  • 13 Days Ago

AI Assistant is available now!

Feel free to start your new journey!