Demo

Engineering Manager, HPC Platform

GTN Technical Staffing
Dallas, TX Full Time
POSTED ON 4/15/2026
AVAILABLE BEFORE 5/14/2026

Engineering Manager, HPC Platform

Location: Dallas, TX (Hybrid)

Type: Direct Hire

• Competitive base salary performance bonus

• 100% company-paid benefits

• Relocation available

Overview

We are seeking an Engineering Manager, HPC Platform to lead the design, scaling, and operational excellence of a bare-metal Kubernetes platform powering HPC, AI/ML workloads, and next-generation CaaS / GPUaaS environments.

This organization operates at the forefront of high-performance computing and AI infrastructure, building platforms that support large-scale research, simulation, and production workloads. This role will lead a team responsible for delivering multi-tenant, GPU-accelerated compute platforms, enabling GPU-as-a-Service (GPUaaS) and Container-as-a-Service (CaaS) capabilities across distributed data center environments.

This is a hands-on leadership role focused on platform performance, reliability, and automation. You will define the technical roadmap, guide system architecture, and ensure the platform delivers high-throughput, low-latency performance at scale for distributed HPC and AI workloads.

Key Responsibilities

Leadership & Team Development

  • Lead, mentor, and grow a team of engineers building and scaling HPC and Kubernetes-based platform infrastructure
  • Foster a culture of ownership, operational excellence, and continuous improvement
  • Drive alignment across engineering, platform, and infrastructure teams

Platform Architecture & Engineering

  • Architect and scale a bare-metal Kubernetes platform supporting HPC, AI/ML, and CaaS / GPUaaS workloads
  • Design and optimize multi-tenant GPU and CPU environments, including workload isolation, scheduling, and resource management
  • Define architecture patterns for high-performance, distributed compute platforms

GPU Platform & Workload Optimization

  • Optimize GPU utilization, scheduling, and performance across large-scale clusters
  • Support AI/ML training, LLM workloads, and scientific computing at scale
  • Ensure efficient workload orchestration across Kubernetes and HPC scheduling environments

Automation, SRE & Platform Operations

  • Drive automation using Infrastructure-as-Code (Terraform, Ansible) and CI/CD pipelines
  • Implement SRE best practices for reliability, observability, and incident response
  • Build scalable operational frameworks supporting large, multi-tenant compute environments

Performance, Reliability & Capacity Planning

  • Own platform performance, uptime, and scalability across thousands of nodes
  • Define and track KPIs for system health, utilization, and performance
  • Lead capacity planning and forecasting aligned with rapid compute growth

Cross-Functional Collaboration

  • Partner with research, storage, and networking teams to integrate distributed filesystems and high-speed interconnects (InfiniBand, RoCE)
  • Collaborate with hardware and software vendors to improve platform capabilities and deployment efficiency
  • Align platform architecture with evolving HPC, AI, and GPUaaS / CaaS delivery models

Required Experience

  • 7 years of experience in infrastructure, platform, or SRE engineering, with 2 years in a technical leadership role
  • Proven experience operating Kubernetes environments for HPC, AI/ML, or GPU-accelerated workloads
  • Experience designing or supporting CaaS, GPUaaS, or multi-tenant compute platforms
  • Deep expertise in Linux systems, networking, and performance optimization on bare-metal infrastructure
  • Experience managing large-scale distributed clusters and integrating storage and high-performance networking
  • Strong experience with automation tools (Terraform, Ansible) and observability platforms (Prometheus, Grafana, Loki)
  • Strong communication and leadership skills with the ability to translate technical direction into execution

Preferred Experience

  • Familiarity with HPC schedulers (Slurm, Flux) and hybrid scheduling models
  • Experience with container runtimes (containerd, CRI-O) and Kubernetes internals
  • Contributions to open-source Kubernetes, HPC, or ML infrastructure projects
  • Experience operating in hyperscale or AI-focused infrastructure environments

Additional Requirements

  • This position requires applicants to be currently authorized to work in the U.S. without employer sponsorship.
  • We are unable to sponsor or take over sponsorship of employment visas at this time.

Salary.com Estimation for Engineering Manager, HPC Platform in Dallas, TX
$140,506 to $168,108
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Engineering Manager, HPC Platform?

Sign up to receive alerts about other jobs on the Engineering Manager, HPC Platform career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$151,448 - $188,145
Income Estimation: 
$203,425 - $249,816
Income Estimation: 
$213,375 - $267,876
Income Estimation: 
$190,687 - $235,769
Income Estimation: 
$151,448 - $188,145
Income Estimation: 
$203,425 - $249,816
Income Estimation: 
$213,375 - $267,876
Income Estimation: 
$190,687 - $235,769
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at GTN Technical Staffing

  • GTN Technical Staffing San Diego, CA
  • Job Title: Certified Scrum Master Location: US-San Diego, California Employment Type: Contract to Hire Must be local-Onsite 4 days a week Range of 70/hour ... more
  • 10 Days Ago

  • GTN Technical Staffing Las Vegas, NV
  • Job Title: Desktop Support Technician (Enterprise / Onsite) Location: Las Vegas, NV – Fully Onsite Work Schedule: Standard business hours with occasional o... more
  • 14 Days Ago

  • GTN Technical Staffing Dallas, TX
  • Compute Platform Engineer Location: Dallas, TX (Hybrid) Type: Direct Hire • Competitive base salary performance bonus • 100% company-paid benefits Overview... more
  • 14 Days Ago

  • GTN Technical Staffing Miami, FL
  • Job Title SR Bi-Lingual (Spanish)Data Engineer Location-Miami, FL Hybrid, located in Miami, FL Employment Type--Contract to Hire Hourly of 65/hour to 70/ho... more
  • Just Posted


Not the job you're looking for? Here are some other Engineering Manager, HPC Platform jobs in the Dallas, TX area that may be a better fit.

  • Harnham Dallas, TX
  • Cloud Engineering Manager / Platform Engineering Manager Location: Dallas, TX (On‑site, 5 days per week) Reports to: Director Role Overview This is a hands... more
  • 13 Days Ago

  • FedEx Plano, TX
  • Job Summary The IT Manager, Platform Engineering role leads a team of software engineers in the design, development, and delivery of resilient enterprise s... more
  • 21 Days Ago

AI Assistant is available now!

Feel free to start your new journey!