Demo

Senior Kubernetes Engineer - HPC / GPU

GTN Technical Staffing
Dallas, TX Full Time
POSTED ON 4/25/2026
AVAILABLE BEFORE 6/19/2026

Senior Kubernetes Engineer (GPU / AI Platforms)

Location: Dallas, TX (Hybrid)

Type: Direct Hire

• Competitive base salary performance bonus

• 100% company-paid benefits

• Relocation available

Overview

We are seeking a Senior Kubernetes Engineer to design, build, and optimize GPU-accelerated container platforms supporting large-scale HPC, AI/ML workloads, and next-generation CaaS / GPUaaS environments.

This role is focused on enabling scalable, multi-tenant compute platforms that power GPU-as-a-Service (GPUaaS) and Container-as-a-Service (CaaS) offerings across hybrid and on-prem infrastructure. You will work at the intersection of Kubernetes and the NVIDIA ecosystem, driving performance, efficiency, and reliability for high-throughput, GPU-intensive workloads.

The ideal candidate brings deep hands-on experience building production-grade Kubernetes platforms for AI and HPC workloads, along with strong development skills and a passion for high-performance, distributed systems at scale.

Key Responsibilities

Kubernetes Platform Engineering

  • Architect, deploy, and operate Kubernetes clusters optimized for GPU-intensive and multi-tenant workloads
  • Design platforms supporting CaaS / GPUaaS delivery models, ensuring scalability, resilience, and performance
  • Leverage NVIDIA GPU Operator, Network Operator, and DCGM for cluster management and observability

GPU Enablement & Scheduling

  • Integrate NVIDIA device plugins, MIG, and GPU sharing capabilities into Kubernetes scheduling frameworks
  • Optimize GPU utilization and workload placement using scheduler extensions (kube-scheduler plugins, Slurm, Volcano)
  • Support AI/ML training, LLM workloads, and scientific computing at scale

Automation & Platform Development

  • Develop and maintain Kubernetes operators and custom controllers
  • Automate platform provisioning and lifecycle management using Go or Python
  • Implement Infrastructure-as-Code using Terraform, Helm, and Kustomize

Observability & Performance Optimization

  • Implement monitoring and telemetry using Prometheus, Grafana, DCGM Exporter, and OpenTelemetry
  • Drive performance tuning, capacity planning, and optimization across GPU-enabled clusters
  • Support incident response and ensure production readiness

Security & Multi-Tenancy

  • Design secure, multi-tenant Kubernetes environments using RBAC, namespaces, and policy enforcement (OPA, Gatekeeper)
  • Ensure workload isolation and governance across shared GPU infrastructure
  • Support secure platform operations across CaaS / GPUaaS environments

DevOps & CI/CD

  • Build and maintain CI/CD pipelines using GitOps tools such as ArgoCD and FluxCD
  • Support continuous delivery and lifecycle management of Kubernetes-based platforms

Cross-Functional Collaboration

  • Partner with HPC, AI/ML, DevOps, and platform engineering teams to support high-performance workloads
  • Collaborate on platform architecture, optimization strategies, and operational best practices

Required Experience

  • Extensive experience operating Kubernetes in production-scale environments
  • Strong experience supporting HPC, AI/ML, or GPU-accelerated infrastructure
  • Experience designing or supporting CaaS, GPUaaS, or multi-tenant platform environments
  • Deep expertise with NVIDIA and Kubernetes ecosystems including GPU Operator, device plugins, NVML, MIG, and DCGM
  • Strong understanding of Kubernetes internals (CRDs, RBAC, controllers, scheduler extensions)
  • Proficiency in Go or Python for automation and operator development
  • Experience supporting GPU-intensive workloads (LLMs, AI/ML pipelines, HPC applications)
  • Hands-on experience with Helm, Kustomize, and GitOps workflows

Technical Skills

  • Monitoring and observability: Prometheus, Grafana, DCGM Exporter, OpenTelemetry
  • Networking: CNI plugins (NVIDIA CNI, Multus), service networking, cluster networking concepts
  • Infrastructure-as-Code: Terraform, Helm, Kustomize
  • CI/CD and GitOps practices

Preferred Experience

  • Experience with container runtimes (containerd, CRI-O, NVIDIA Container Toolkit)
  • Exposure to advanced networking solutions such as Cilium
  • Contributions to open-source projects within Kubernetes or NVIDIA ecosystems
  • Experience working in large-scale HPC or AI infrastructure environments

Additional Requirements

  • This position requires applicants to be currently authorized to work in the U.S. without employer sponsorship.
  • We are unable to sponsor or take over sponsorship of employment visas at this time.

Salary : $150,000 - $230,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Senior Kubernetes Engineer - HPC / GPU?

Sign up to receive alerts about other jobs on the Senior Kubernetes Engineer - HPC / GPU career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$115,275 - $131,105
Income Estimation: 
$135,136 - $164,847
Income Estimation: 
$117,024 - $149,811
Income Estimation: 
$137,568 - $176,908
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at GTN Technical Staffing

  • GTN Technical Staffing Dallas, TX
  • Manager, HPC Solutions Architecture Location: Dallas, TX (Hybrid) Type: Direct Hire • Competitive base salary performance bonus • 100% company-paid benefit... more
  • 1 Day Ago

  • GTN Technical Staffing Dallas, TX
  • Data Center HPC Network Architect Location: Dallas, TX Base Bonus 100% Company paid benefits Overview This organization is backed by dedicated leadership a... more
  • 1 Day Ago

  • GTN Technical Staffing Dallas, TX
  • Senior Network Engineer – Data Center / HPC Infrastructure Location: Dallas, TX (Hybrid) Type: Direct Hire • Competitive base salary performance bonus • 10... more
  • 1 Day Ago

  • GTN Technical Staffing Miami, FL
  • Job Title SR Bi-Lingual (Spanish)Data Engineer Location-Miami, FL Hybrid, located in Miami, FL Employment Type--Contract to Hire Hourly of 65/hour to 70/ho... more
  • 3 Days Ago


Not the job you're looking for? Here are some other Senior Kubernetes Engineer - HPC / GPU jobs in the Dallas, TX area that may be a better fit.

  • CAVA Plano, TX
  • Company Profile: At CAVA we make it deliciously simple to eat well and feel good every day. We are guided by a Mediterranean heritage that’s been perfectin... more
  • 13 Days Ago

  • Net2Source (N2S) Dallas, TX
  • Role Title: Kubernetes, Container/ECS/EKS, Nutanix Location: Dallas, TX Mandatory Skills Kubernetes Containerization technologies (ECS/EKS) Nutanix Role De... more
  • 24 Days Ago

AI Assistant is available now!

Feel free to start your new journey!