What are the responsibilities and job description for the Associate Director Application Support Engineering position at DTCC Candidate Experience Site?
Do you want to work on innovative projects, collaborate with a dynamic and supportive team, and receive investment in your professional development? At DTCC, we are at the forefront of innovation in the financial markets. We are committed to helping our employees grow and succeed. We believe that you have the skills and drive to make a real impact. We foster a thriving internal community and are committed to creating a workplace that looks like the world that we serve.
The Information Technology group delivers secure, reliable technology solutions that enable DTCC to be the trusted infrastructure of the global capital markets. The team delivers high-quality information through activities that include development of essential, building infrastructure capabilities to meet client needs and implementing data standards and governance.
Pay and Benefits:
- Competitive compensation, including base pay and annual incentive
- Comprehensive health and life insurance and well-being benefits, based on location
- Pension / Retirement benefits
- Paid Time Off and Personal/Family Care, and other leaves of absence when needed to support your physical, financial, and emotional well-being.
- DTCC offers a flexible/hybrid model of 3 days onsite and 2 days remote (onsite Tuesdays, Wednesdays and a third day unique to each team or employee).
Your Primary Responsibilities:
- Participate in design reviews, sprint zero, and delivery planning to champion non‑functional requirements (NFRs) including resiliency, observability, fault tolerance, Holiday and Special days processing, as well as disaster recovery.
- Collaborate with Major Release Management to ensure each Risk release meets SRE standards for observability and resiliency (SLIs/SLOs, monitoring, knowledge base articles). Ensure releases are subject to required deployment validations.
- Define and evolve monitoring, alerting, SLIs, and SLOs, leveraging AI/ML‑driven analytics for anomaly detection, incident correlation, and early risk identification.
- Make design recommendations that will quick detection of outage conditions and allow the application to recover without manual interventions and/or create a knowledge based guidance for application support team to follow for improved application recovery times. Participate in major incident response / Root Cause analysis to drive continual systemic recovery time improvements.
- Drive automation and intelligent tooling (including AI‑assisted remediation) to reduce manual toil and improve consistency and recovery times.
- Attend and present operational readiness with application support (EAS L2) at project management meeting - raise any operational risks and concerns. Test NFRs in UAT environments to validate effectiveness and completeness of operational capabilities. Validate operational readiness prior to release with stakeholders, partner with Embedded Risk and Security teams, and proactively surface and mitigate technology and operational risks.
- Lead capacity planning and performance analysis to ensure Risk platforms scale reliably under high load.
- Establish KPIs and operational metrics to demonstrate reliability improvements and operational maturity.
-
Build a strong SRE culture—enhanced by AI‑driven insights—across Risk Application Support and Development through mentorship and best‑practice coaching; leverage approved AI tools to analyze code and collaborate on knowledge base articles, and to accelerate improvements in observability, performance, security, and maintainability.
Qualifications:
- Minimum of 8 years of related technical and management experience
-
Bachelor's degree preferred or eq
uivalent experience
- Cloud certifications is a plus
Talents Needed for Success:
- Proven experience with SRE or DevOps practices, including CI/CD pipelines, infrastructure as code, and automation frameworks
- Strong understanding of monitoring and observability platforms (e.g., Grafana) and experience designing and fine‑tuning robust monitoring systems
- Programming proficiency in one or more languages such as Python, Java, Go, or similar, for automation and tooling development
- Familiarity with cloud platforms, containerized environments, and/or hybrid infrastructure models
- Experience in financial services, capital markets, or regulated environments
- Demonstrated participation in disaster recovery, performance, and resiliency testing
- Knowledge of AI concepts, data platforms, messaging systems, and large‑scale batch or real‑time processing systems
- Strong collaboration skills across technology and business teams
-
Hands‑on experience leading and participating in incident and problem management, including root cause analysis
The salary range is indicative for roles at the same level within DTCC across all US locations. Actual salary is determined based on the role, location, individual experience, skills, and other considerations. We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status, or disability status. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.