Demo

Senior Site Reliability Engineer

VDart, Inc.
Atlanta, GA Contractor
POSTED ON 6/9/2026
AVAILABLE BEFORE 7/9/2026

Job Title: Senior Site Reliability Engineer

Location: Atlanta, GA

Duration: / Term: Contract

Experience Desired: 8 Years

 

Job Description:

CDP MISSION: Our mission is to be the authoritative source of truth for customer data — delivering timely, high-quality data at scale to power the contextual experiences that drive the growth of this company. Every customer profile must be accurate, trusted, and available when it matters, across every touchpoint, for the entire US adult population.

We are seeking a Senior Reliability Engineer to own production excellence for our Customer Data Platform (CDP) — the authoritative source of truth for customer data across the entire US adult population.

An authoritative platform is only authoritative if it is available, secure, and timely. This role ensures exactly that: high availability, operational resilience, and compliance for the critical data systems that power customer experiences across every touchpoint. You will lead 24x7 production support, incident management, platform governance, and security compliance — ensuring CDP remains the trusted foundation the business depends on.

You will act as the bridge between engineering, platform, security, and compliance teams, driving the operational discipline that keeps CDP resilient, secure, and audit-ready at all times.

 

Job Responsibilities

KTLO Leadership and Production Support

  • Lead KTLO operations including 24x7 monitoring, incident management, and on-call processes — understanding that CDP downtime directly impacts customer experiences and business decisions
  • Oversee production support for data pipelines, APIs, and platform services across Azure and Databricks ecosystems
  • Manage job orchestration and monitoring (e.g., Control-M), ensuring SLA adherence and timely resolution — because timeliness is a core promise of the authoritative source of truth
  • Establish and enforce runbooks, SOPs, and escalation procedures tailored to CDP''s criticality
  • Drive root cause analysis (RCA) and implement preventive measures to reduce recurring issues and protect data trust

 

Reliability Engineering and Operations

  • Improve system reliability through automation, observability, proactive monitoring, and near-real-time availability targets
  • Define and track SLAs, SLIs, and SLOs for critical CDP systems — with metrics aligned to data freshness, accuracy, and availability commitments
  • Partner with engineering teams to implement resiliency patterns, failover strategies, and capacity planning for population-scale data processing
  • Identify and eliminate operational bottlenecks and manual processes that threaten CDP''s reliability and timeliness

 

Compliance, Security, and Governance

  • Lead execution of compliance mandates, audits, and regulatory requirements impacting CDP systems — ensuring the platform that holds data for the entire US adult population meets the highest security standards
  • Manage and remediate security violations, vulnerabilities, and policy breaches with urgency
  • Oversee access controls, audit readiness, and governance processes in collaboration with security teams — protecting the trust that makes CDP authoritative
  • Ensure adherence to data protection and privacy standards across all customer data systems

 

Platform Maintenance and Operational Hygiene

  • Manage patching, upgrades, and vulnerability remediation across CDP platforms
  • Lead password and credential rotation processes across systems and integrations
  • Ensure operational readiness for infrastructure and platform changes with zero-downtime deployment practices
  • Coordinate with vendors and platform teams for issue resolution and maintenance activities

 

Collaboration and Leadership

  • Lead and coordinate onshore/offshore support teams, ensuring effective coverage and handoffs for 24x7 operations
  • Partner with Data Engineering, AI/ML, and Platform teams to ensure operability and supportability of all CDP systems
  • Provide operational readiness reviews for new deployments and features before they enter production
  • Mentor team members and drive a culture of accountability, ownership, and continuous improvement

 

Education and Work Experience

  • Bachelor''s degree in Computer Science, Engineering, or related field
  • 6 years of experience in production support, SRE, or platform operations roles
  • Proven experience managing 24x7 support models and distributed teams
  • Experience supporting large-scale data platforms in cloud environments (Azure preferred)
  • Experience with security compliance and audit processes for systems handling sensitive customer data

 

Technical Skills

  • Strong experience with Azure ecosystem (ADLS, Databricks, ADF, Event Hub, etc.)
  • Experience with job orchestration tools (Control-M or similar)
  • Solid understanding of data pipelines, ETL/ELT processes, and distributed systems at scale
  • Experience with monitoring and observability tools (e.g., Azure Monitor, Log Analytics, Splunk, Prometheus)
  • Familiarity with incident management tools and processes (PagerDuty, ServiceNow, etc.)
  • Experience with CI/CD pipelines and release management
  • Knowledge of security practices, access control, encryption, and compliance frameworks relevant to customer data
  • Scripting experience (Python, Shell) for automation and operational tooling

 

Knowledge, Skills, and Abilities

  • Strong operational mindset with unwavering focus on stability, reliability, and uptime for a platform the entire business depends on
  • Ability to manage high-pressure production incidents and drive resolution with urgency and precision
  • Deep understanding of why platform reliability and security are foundational to CDP''s authority as the source of truth
  • Strong problem-solving and root cause analysis skills
  • Excellent coordination and communication across engineering, security, and business teams
  • Ability to balance short-term fixes with long-term reliability improvements
  • Leadership skills in managing global support teams and rotations

 

 

Key Skills:

SRE, Data Operations, Azure, Production Support, 24x7 monitoring, incident management

Salary : $60 - $65

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Senior Site Reliability Engineer?

Sign up to receive alerts about other jobs on the Senior Site Reliability Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$144,264 - $191,312
Income Estimation: 
$140,435 - $166,410
Income Estimation: 
$122,257 - $154,284
Income Estimation: 
$143,391 - $179,890
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at VDart, Inc.

  • VDart, Inc. Omaha, NE
  • Job Title: Python Developer Location: Omaha, NE Duration: 12 Months Job Description & Skill Requirement: Must Have: Python 3, Linux/Unix System Administrat... more
  • 3 Days Ago

  • VDart, Inc. Bellevue, WA
  • Business/Data Analyst Bellevue, WA (Hybrid – 2 day onsite ) – Locals only Contract Skills: Data Experience with Azure Qualifications Bachelor's degree in b... more
  • 3 Days Ago

  • VDart, Inc. Williamsville, NY
  • We are seeking a detail-oriented and customer-focused Customer Care Representative to support utility billing and account management operations. This role ... more
  • 3 Days Ago

  • VDart, Inc. Beach, FL
  • We are seeking an experienced and results-driven Project Manager to lead the execution of large-scale energy storage and infrastructure projects from field... more
  • 3 Days Ago


Not the job you're looking for? Here are some other Senior Site Reliability Engineer jobs in the Atlanta, GA area that may be a better fit.

  • qgenda Atlanta, GA
  • Who We Are QGenda is redefining healthcare workforce management everywhere care is delivered. We're on a mission to empower the healthcare industry to bett... more
  • 2 Days Ago

  • LoadUp Alpharetta, GA
  • Who We Are LoadUp is a fast-growing company that provides a transparent and convenient solution to on-demand home services through our custom tech-enabled ... more
  • 20 Days Ago

AI Assistant is available now!

Feel free to start your new journey!