What are the responsibilities and job description for the Senior Kubernetes Platform Engineer position at Jobs via Dice?

Dice is the leading career destination for tech experts at every stage of their careers. Our client, XPath Solutions LLC, is seeking the following. Apply via Dice today!

Senior Kubernetes Platform Engineer

ML / GenAI Infrastructure | Terraform | Cloud-Native

In Person Interview is Non-Negotiable

Location: Charlotte, NC - On-Site/Hybrid

Employment Type: Contract-to-Hire

Experience: 7–12 Years (5 years hands-on Kubernetes)

Industry: Enterprise AI / Cloud Infrastructure

⸻

About The Role

We are looking for a Senior Kubernetes Platform Engineer to design, build, and operate mission-critical Kubernetes infrastructure that powers large-scale Machine Learning (ML) and Generative AI (GenAI) workloads.

This is not a standard Kubernetes admin role — you will act as a subject matter expert, driving architecture decisions across scheduling, networking, security, storage, and multi-tenancy. You will work closely with ML engineers, researchers, and application teams to build scalable, GPU-optimized platforms that accelerate AI innovation.

⸻

Key Responsibilities

Kubernetes Platform Engineering

Design, deploy, and manage multi-cluster Kubernetes environments (EKS, GKE, AKS)
Build advanced Kubernetes components including CRDs, Operators, admission webhooks, and custom schedulers
Optimize Kubernetes for GPU workloads (NVIDIA device plugins, MIG, time-slicing)
Implement autoscaling solutions (HPA, VPA, KEDA, Cluster Autoscaler)
Enforce security using RBAC, OPA/Gatekeeper, and Pod Security Standards
Manage service mesh (Istio / Linkerd) for secure and observable microservices
Configure networking (Cilium, Calico), ingress controllers, and network policies
Lead cluster lifecycle management (upgrades, backups, disaster recovery)
Package platform components using Helm and Kustomize

⸻

ML / GenAI Infrastructure

Design ML pipelines using Kubeflow, Argo Workflows, or Ray
Build scalable model serving platforms (KServe, Triton, TorchServe, vLLM)
Optimize distributed compute using Ray on Kubernetes
Design storage solutions for ML datasets and artifacts (EFS, GCS, NFS, etc.)
Enable GPU-backed environments (JupyterHub, Kubeflow Notebooks)
Deploy and manage vector databases for RAG applications
Optimize LLM inference (batching, caching, multi-GPU scaling)

⸻

Infrastructure as Code (Terraform)

Develop and maintain reusable Terraform modules for cloud infrastructure
Implement remote state management and multi-environment workflows
Enforce best practices: versioning, drift detection, policy-as-code
Integrate Terraform into CI/CD pipelines and GitOps workflows
Use tools like Atlantis or Terraform Cloud for automated deployments

⸻

Observability, Security & Reliability

Build observability stack (Prometheus, Grafana, Loki, Jaeger/Tempo)
Implement audit logging and runtime security (Falco, SIEM integration)
Define SLOs/SLIs and maintain platform reliability
Perform GPU capacity planning and cost optimization
Lead incident response and post-mortem analysis

⸻

Required Skills & Technologies

Kubernetes (Expert level)
Terraform (Advanced)
Helm / Kustomize
AWS / Google Cloud Platform / Azure (EKS, GKE, AKS)
Istio / Linkerd
Argo Workflows / Kubeflow / Ray
KServe / Triton
Prometheus / Grafana
Cilium / Calico
OPA / Gatekeeper
NVIDIA GPU Operator
Docker / containerd
GitOps tools (ArgoCD / Flux)
Python / Go / Bash
Linux systems and networking

⸻

Required Qualifications

7 years in cloud/platform engineering
5 years hands-on Kubernetes in production
Deep understanding of Kubernetes internals (control plane, CNI, CSI, etc.)
Experience running GPU-based ML/AI workloads at scale
Strong Terraform expertise (modules, CI/CD, multi-cloud)
Experience with ML orchestration tools (Kubeflow, Argo, or Ray)
Proficiency in at least one programming language (Python, Go, or Bash)
Experience with GitOps and secure container practices

⸻

Preferred Qualifications

CKA (Certified Kubernetes Administrator) — Required
CKS (Certified Kubernetes Security Specialist) — Preferred
CKAD certification
Cloud DevOps certifications (AWS / Google Cloud Platform)
Terraform certification
Experience with Crossplane or multi-cluster management
Familiarity with eBPF tools (Hubble, Pixie)
Contributions to CNCF or open-source Kubernetes ecosystem

⸻

What You’ll Deliver (First 90 Days)

Day 30: Audit existing Kubernetes clusters and deliver a gap analysis
Day 60: Implement Terraform-managed clusters with security and observability
Day 90: Deploy production-ready model serving platform with SLO dashboards

⸻

Who You Are

A systems thinker with a strong platform mindset
Proactive and automation-driven
Comfortable working cross-functionally with ML and engineering teams
Influential communicator who can drive architecture decisions
Security-focused and reliability-driven

⸻

Why Join Us

This role is ideal for engineers passionate about Kubernetes and AI infrastructure who want to build the backbone of next-generation enterprise AI platforms.

Apply for this job

Receive alerts for other Senior Kubernetes Platform Engineer job openings

Senior Kubernetes Platform Engineer

What are the responsibilities and job description for the Senior Kubernetes Platform Engineer position at Jobs via Dice?

What is the career path for a Senior Kubernetes Platform Engineer?

Job openings at Jobs via Dice

Not the job you're looking for? Here are some other Senior Kubernetes Platform Engineer jobs in the Charlotte, NC area that may be a better fit.

We don't have any other Senior Kubernetes Platform Engineer jobs in the Charlotte, NC area right now.

AI Assistant is available now!