What are the responsibilities and job description for the Technical Senior Principal | Site Reliability Engineer | Kubernetes position at GM Financial?

Job Description

Why GMF Technology?

GM Financial is set to change the auto finance industry and is leading the path of embarking on tech modernization – we have a startup mindset, and preserve our small company culture, in a public company environment with financial stability and intense growth over a decade-plus history. We are data junkies and trust in data and insights to advance our business objectives. We take our goal of zero emission, zero collision, zero congestion, and zero friction very seriously. We believe as an auto finance market leader we are in the driver's seat to lead us in the GM EV mission to change the world. We are building global platforms, in LATAM, Europe, China, U.S. and Canada– and we are looking to grow our high-performing team. GMF is comprised of over 10,000 team members globally. Join our fintech culture within a Blue-Chip company where we are changing the way we use technology to support our customers, dealers and business.

Flexible hybrid work environment (onsite 3 days a week/2 days remote) at our Arlington (AOC1), TX office.

Responsibilities

About the Role

As a Senior Principal SRE, you will be the technical bar‑raiser for our centralized Kubernetes platform—setting strategy, owning reliability at fleet scale, and leading cross‑org engineering to deliver a self‑service, secure, and compliant platform. You will partner with Architecture, BPS, Cloud Ops, and Cyber to turn our roadmap into durable, automated capabilities that product teams adopt with minimal toil.

Top Outcomes You Will Drive

Fleet‑level reliability strategy for shared and dedicated clusters, defining SLOs/SLIs and error budgets for the platform and golden patterns, with automated enforcement and reporting.
Self‑service at scale: deliver Namespace‑as‑a‑Service and developer‑portal workflows that shrink onboarding from weeks to hours and unlock safe autonomy for product teams.
Observability by default: land built‑in cluster/workload dashboards (Splunk APM Azure Monitor/App Insights) and a robust RCA/Problem‑Management loop that closes the gap between incidents and engineering improvements.
Multi‑cloud readiness: guide centralized Kubernetes deployment expansion to AWS and design portable patterns (identity, networking, GitOps) that remain cloud‑agnostic.
Secure networking & policy: lead adoption of Calico Enterprise (DNS‑based policy, honey pods, central policy mgmt.) and staged rollout of stretched mesh/identity‑based access across clusters.
Path to a Kubernetes-as-a-Serverless : influence the architecture that abstracts K8s, integrates pre‑connected services, and enforces governance/consistency with a service catalog and on‑demand APIs.
Scale the operating model: codify the RACI, reduce reactive workload, shift‑left with support enablement, and build automation that lets a small core team support a large fleet.

Core Responsibilities

Own multi‑cluster reliability: capacity modeling, failure domain strategy, upgrade design (blue/green, surge, or secondary‑cluster) and chaos/DR exercises across shared & dedicated environments.
Define and implement platform SLOs/SLIs (control plane, base stack, onboarding, GitOps, network policy propagation, secret/cert rotation) with automated alerts and error‑budget policies.
Lead the design/implementation of Namespace‑as‑a‑Service; measure adoption, lead time, and customer effort score.
Establish GitOps standards (Argo CD) for app and cluster configuration, including bootstrap, drift detection, and progressive delivery (blue/green, canary).
Architect and land Calico/Tigera Enterprise and/or service mesh patterns (east‑west controls, identity‑based policies, multi‑cluster traffic mgmt.), with guardrails and paved‑road configs.
Lead security & compliance by default: SR controls, RBAC baselines (Azure RBAC/workload identity), cert‑manager automation, patch cadence, and auditable change pipelines.
Serve as principal‑level incident commander and RCA owner for platform incidents; convert findings into backlog items, patterns, and training.
Partner with the necessary teams to scale operations and refine RACI; implement charge/show‑back models for high‑touch migrations when appropriate.
Mentor Staff/Principal engineers; raise the bar on design docs, ADRs, runbooks, and knowledge sharing across the platform and product teams.

Qualifications

What makes you a dream candidate?

Knowledge And Skills

Deep experience with GitOps (Argo CD), service mesh (Istio/Linkerd), Calico/Tigera, cert‑manager, secret engines, and workload identity.
Strong IaC/automation: Terraform, Azure DevOps (YAML), CI/CD policy gates, automated security controls.
Observability at scale: Splunk APM, Azure Monitor, Application Insights; golden dashboards and SLO pipelines.
Distributed systems fundamentals: performance, scalability, capacity, and reliability.
Excellent communication; ability to lead across org boundaries and mentor senior engineers.

Experience And Education

High School Diploma or equivalent required
Bachelor’s Degree or Associate Degree plus 2 additional years of relevant experience required
12 years in related function(s) required
5-7 years of experience leading through mentorship in related field required
5-7 years of experience driving thought leadership and innovation across products required

Preferred Skills

Multi‑cluster and multi‑region upgrade strategies (surge/blue‑green), active‑active patterns, and zero‑downtime migrations.
Network policy at scale (DNS‑based policies), L7 authorization, east‑west security controls.
Self‑service developer portals and onboarding workflows; measuring adoption and customer effort.
FinOps for Kubernetes (charge/show‑back, pod‑level cost breakdown), quota guardrails, and capacity/right‑sizing automation.
Experience with Kubernetes platform abstraction and curated service catalogs.
Expert in SRE: SLO/SLI design, error budgets, incident command, RCA/Problem Management, chaos/DR.

What We Offer: Generous benefits package available on day one to include: 401K matching, bonding leave for new parents (12 weeks, 100% paid), tuition assistance, training, GM employee auto discount, community service pay and nine company holidays.

Our Culture: Our team members define and shape our culture — an environment that welcomes innovative ideas, fosters integrity, and creates a sense of community and belonging. Here we do more than work — we thrive.

Compensation: Competitive pay and bonus eligibility

Work Life Balance: Flexible hybrid work environment, 2-days a week in office

#GMFjobs

Apply for this job

Receive alerts for other Technical Senior Principal | Site Reliability Engineer | Kubernetes job openings

Technical Senior Principal | Site Reliability Engineer | Kubernetes

What are the responsibilities and job description for the Technical Senior Principal | Site Reliability Engineer | Kubernetes position at GM Financial?

Job openings at GM Financial

Not the job you're looking for? Here are some other Technical Senior Principal | Site Reliability Engineer | Kubernetes jobs in the Arlington, TX area that may be a better fit.

We don't have any other Technical Senior Principal | Site Reliability Engineer | Kubernetes jobs in the Arlington, TX area right now.

AI Assistant is available now!