Demo

Sr. Site Reliability Engineer

Cox Automotive
Mission, KS Full Time
POSTED ON 12/20/2025
AVAILABLE BEFORE 2/17/2026

The Site Reliability Engineer - Incident Response is a critical enterprise-level role responsible for accelerating incident resolution and enhancing the overall incident management process. This individual partners with engineering teams during active incidents to troubleshoot issues using monitoring and logging tools, and post-incident, delivers executive-level summaries that clearly communicate impact, root cause, and resolution. The SRE - Incident Response also plays a key role in analyzing incident response effectiveness and identifying opportunities for systemic improvements.

Core Competencies and Qualifications:

  • Bachelor's degree in a related discipline and 4 years' experience in a related field. The right candidate could also have a different combination, such as a master's degree and 2 years' experience; a Ph.D. and up to 1 year of experience; or 16 years' experience in a related field.
  • Applicants must currently be authorized to work in the United States for any employer without current or future sponsorship. No OPT, CPT, STEM/OPT or visa sponsorship now or in future.
  • Engineering/Tooling: Demonstrates the ability to design, build, and maintain engineering solutions and tools that enhance reliability, automate incident response, and reduce operational toil.
  • Incident Troubleshooting: Skilled in interpreting logs, metrics, and traces to assist in identifying root causes during live incidents.
  • Monitoring & Observability: Proficient in tools such as Datadog, Splunk, New Relic, or similar platforms.
  • Strong programming background in Python, Java, or C#, with experience building, maintaining, and troubleshooting production-grade services and automation tools.
  • Proven ability to design and implement reliable, scalable, and highly available systems, leveraging software engineering best practices to improve system resilience and operational efficiency.
  • Experience developing automation and tooling to reduce toil, improve incident response, and support continuous improvement across monitoring, deployment, and recovery processes.
  • Ability to collaborate closely with software engineering teams to influence architecture and operational readiness, ensuring reliability is built into the system from design through production.
  • AI Centric Engineering: Effectively leverages artificial intelligence (AI) and machine learning (ML) tools to automate, optimize, and enhance daily engineering and incident response tasks.
  • Analytical Rigor: Strong attention to detail in validating incident data and identifying trends or gaps in response.
  • DevOps & Architecture Knowledge: Understanding full-stack systems, CI/CD pipelines, caching, scaling, and cloud-native infrastructure.
  • Metrics & Reporting: Capable of calculating and interpreting key metrics like MTTA (Mean Time to Acknowledge) and MTTR (Mean Time to Resolve).

Here are the responsibilities of this role when not tied to active on-call:

Post-Incident Review Development

  • Draft and deliver executive summaries post-incident
  • Develop and coach teams on blameless postmortems.
  • Create templates, train facilitators, and help guide root cause analysis (e.g., 5 Whys, fishbone diagrams).
  • Maintain a central library of learnings and cross-cutting themes.

Incident Process Improvement

  • Actively support engineering teams during incidents by helping diagnose and resolve issues quickly
  • Navigate and analyze data from observability platforms to make informed inferences about root causes
  • Analyze the effectiveness of incident response to identify systemic reliability gaps.
  • Standardize incident response workflows (incident roles, comms, escalation paths).
  • Create or refine runbooks, incident command frameworks, and severity classification guides.

Metrics and Insights

  • Build dashboards around incident frequency, MTTR, MTTA, and recurrence rates.
  • Use incident data to drive reliability of OKRs or engineering investments.

Tooling & AI Solutions

  • Partner with engineering teams to identify repetitive or high-impact tasks suitable for automation.
  • Develop, implement, and continuously improve custom scripts, bots, and AI-driven workflows for monitoring, alerting, and incident triage.
  • Evaluate and integrate emerging AI/ML technologies to optimize detection, root cause analysis, and reporting.
  • Ensure all tools and automations are secure, maintainable, and aligned with organizational standards and SRE best practices.
  • Document and socialize new tools and AI solutions, enabling adoption and knowledge sharing across teams.

Cross-Team Collaboration

  • Collaborate with Engineering Managers and Incident Commanders to gather and validate incident data
  • Partner with product teams, infra, and leadership to socialize reliability best practices.
  • Act as a reliability "consultant" to squads that have impactful incidents.
  • Recommend enhancements to monitoring, alerting, and response processes to reduce future incident impact

USD 99, ,000.00 per year

Compensation:

Compensation includes a base salary of $99, $165, The base salary may vary within the anticipated base pay range based on factors such as the ultimate location of the position and the selected candidate's knowledge, skills, and abilities. Position may be eligible for additional compensation that may include an incentive program.

Benefits:

The Company offers eligible employees the flexibility to take as much vacation with pay as they deem consistent with their duties, the company's needs, and its obligations; seven paid holidays throughout the calendar year; and up to 160 hours of paid wellness annually for their own wellness or that of family members. Employees are also eligible for additional paid time off in the form of bereavement leave, time off to vote, jury duty leave, volunteer time off, military leave, and parental leave.

Salary : $99 - $165

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Sr. Site Reliability Engineer?

Sign up to receive alerts about other jobs on the Sr. Site Reliability Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$144,264 - $191,312
Income Estimation: 
$140,435 - $166,410
Income Estimation: 
$76,670 - $90,826
Income Estimation: 
$91,609 - $118,978
Income Estimation: 
$92,877 - $110,401
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Cox Automotive

  • Cox Automotive Provo, UT
  • A Mobile Inspector II functions as a member of the Manheim mobile inspections team for a designated region with the primary responsibility of conducting ve... more
  • 13 Days Ago

  • Cox Automotive Tolleson, AZ
  • Job Description General Responsibilities: The CCR will partner with all parties informed of the status of open issues and will work closely with escalation... more
  • 13 Days Ago

  • Cox Automotive Elkridge, MD
  • Part time position: Approximately 9-12 hours per week (Tuesday, Wednesday, Thursday schedule) Key Responsibilities Utilize the AS400 system to record lane ... more
  • 13 Days Ago

  • Cox Automotive Aurora, CO
  • Key Responsibilities Organize and maintain vehicle placement within assigned zones according to established standards. Utilize dashboards and supervisor gu... more
  • 13 Days Ago


Not the job you're looking for? Here are some other Sr. Site Reliability Engineer jobs in the Mission, KS area that may be a better fit.

  • T-Mobile Overland Park, KS
  • At T-Mobile, we invest in YOU! Our Total Rewards Package ensures that employees get the same big love we give our customers. All team members receive a com... more
  • 16 Days Ago

  • Euronet Leawood, KS
  • Description Since 1996, epay, a business segment of Euronet, has been at the center of connecting local and global brands to consumers. Our capabilities, p... more
  • 14 Days Ago

AI Assistant is available now!

Feel free to start your new journey!