Demo

Director of Engineering

XPerf Inc.
Round Rock, TX Full Time
POSTED ON 3/31/2026
AVAILABLE BEFORE 4/28/2026

About XPerf


XPerf AI is an early-stage AI infrastructure startup building software that optimizes GPU and ASIC utilization for large-scale AI compute clusters. We target private cloud, on-premise, and air-gapped datacenter environments — the underserved majority of enterprise AI deployments that hyperscaler-specific tooling leaves behind.


Our platform takes a vendor-agnostic approach across NVIDIA, AMD Instinct, and other accelerators, with a focus on execution efficiency: reducing wasted cycles, detecting training stragglers, and giving operators ground-truth visibility into what their hardware is actually doing.


We are a small, technical team moving fast. This is a foundational engineering leadership role, critical to the company's trajectory.


The Role: Director of Engineering


As Director of Engineering, you will own the technical execution of XPerf's core platform — from GPU observability and cluster instrumentation through Kubernetes-native orchestration and agentic AI application layers. You will work directly with the CEO and founding team to translate roadmap priorities into shipped software, make key architectural decisions, and build and lead a small, high-caliber engineering team.


You will drive the full product development lifecycle: defining application architecture and technical specifications, leading design reviews, breaking roadmap goals into development milestones, and actively participating in implementation and iteration. You will be accountable for quality, velocity, and on-time delivery across all engineering workstreams — setting the bar and holding the team to it.


This is a deeply hands-on role. You will be writing code, reviewing PRs, debugging complex distributed systems, and making real-time architecture calls — not delegating your way through problems.


What You'll Own


Platform Architecture & Engineering


  • Design and evolve XPerf's core observability stack — DCGM, Prometheus, Grafana Alloy, custom exporters for GPU/ASIC utilization metrics
  • Lead development of the performance optimization engine — straggler detection, execution efficiency delta reporting, workload-aware cluster analysis
  • Architect multi-tenant, multi-accelerator instrumentation pipelines targeting NVIDIA and AMD GPU hardware
  • Own the agentic AI application layer — RAG-based datacenter monitoring, automated troubleshooting, and intelligent alerting


Infrastructure, Kubernetes & Observability


  • Implement GitOps pipelines, Helm chart libraries, and CI/CD automation across microservice deployments
  • Own the full monitoring stack — Prometheus, Grafana Mimir/Thanos for multi-tenant scale, Loki for log aggregation, Alloy for collection
  • Build and maintain the xperf-gpu-metrics platform with tiered backends — Direct/CLI, Prometheus, and XPerf Platform modes
  • Develop NCCL/RCCL profiling hooks for deep training workload observability


Team & Process


  • Establish engineering best practices, code review standards, and development processes as the team scales
  • Collaborate with go-to-market on technical narratives, design partner integrations, and customer-facing engineering deliverables
  • Evaluate and onboard new hardware design partners across the GPU accelerator ecosystem


What We're Looking For


Required


  • 3–8 years of engineering experience in software engineering, platform/infrastructure engineering, or AI application development. Delivered at least three commercial enterprise software products.
  • Strong software engineering fundamentals — proficiency in Python and Go for production services, automation, operators, and tooling
  • AI application or agent development experience — RAG pipelines, LLM application layers, agentic frameworks, vector stores, or ML data pipelines — or a strong desire to own this layer as a core part of the role
  • Production Kubernetes experience — bare metal or GPU clusters, custom operators/controllers, CRDs, RBAC, Helm, GitOps
  • Hands-on GPU cluster operations — driver management, NCCL/RCCL testing, DCGM, GPU device plugins, distributed training debugging
  • Strong networking fundamentals — TCP/IP, InfiniBand/RoCE, SR-IOV, high-speed NICs, CNI internals
  • Familiarity with observability stacks — Prometheus, Grafana, and at least one long-term storage backend (Thanos, Mimir, Cortex)
  • Demonstrated ability to lead engineering delivery — driving design reviews, setting milestones, managing iteration cycles, and shipping on time with quality
  • Comfortable in early-stage environments — high ambiguity, fast iteration, and wearing multiple hats simultaneously


Strong Pluses


  • Experience with non-NVIDIA accelerators — AMD Instinct GPU software stack, ROCm, RCCL, and AMD datacenter tooling
  • AI/ML model development or training infrastructure experience — familiarity with distributed training frameworks such as DeepSpeed, Megatron, or PyTorch FSDP
  • Experience building or operating AI data pipelines — embedding, vector indexing, retrieval, or model serving infrastructure
  • LLM application architecture — multi-agent systems, tool-use frameworks, prompt engineering at scale, or fine-tuning pipelines
  • HPC or AI workload scheduling experience — familiarity with job schedulers such as Volcano, SLURM, or LSF, gang scheduling, priority queuing, and resource quota management for large-scale GPU clusters
  • SC (Supercomputing) conference participation or HPC community involvement


What You Won't Find Here


  • A bureaucratic engineering org with layers of approvals — you will have real autonomy
  • A purely managerial role — this position requires deep technical execution
  • NVIDIA-only thinking — we work across NVIDIA and non-NVIDIA accelerators and you should be comfortable with both
  • Hyperscaler-dependent assumptions — our customers run private clouds, on-prem clusters, and air-gapped environments


Compensation & Structure


  • Location: Round Rock, TX — on-site preferred, hybrid considered
  • Stage: Pre-Seed (post-announcement)
  • Compensation: Competitive base meaningful early equity
  • Reports To: CEO / Co-Founder
  • Team Size: Small founding team — first engineering hire at this level


How to Apply


  1. Send a connection request to Alex Carter, CEO and Co-Founder, indicating interests in applying.
  2. Upon connection confirmation, follow up with your resume and a cover letter. The cover letter should explain why you are interested in applying and the most complex infrastructure or GPU cluster problem you've solved. We care far more about your technical depth and judgment than the prestige of your past employers.

Salary.com Estimation for Director of Engineering in Round Rock, TX
$199,382 to $238,338
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Director of Engineering?

Sign up to receive alerts about other jobs on the Director of Engineering career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$213,354 - $274,761
Income Estimation: 
$277,105 - $362,555
Income Estimation: 
$136,714 - $171,621
Income Estimation: 
$151,231 - $194,242
Income Estimation: 
$155,218 - $198,966
Income Estimation: 
$153,752 - $200,235
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Not the job you're looking for? Here are some other Director of Engineering jobs in the Round Rock, TX area that may be a better fit.

  • Jobs via Dice Austin, TX
  • Be part of a team that unleashes the power of leading-edge technologies to help improve the health and well-being of those most vulnerable in our country a... more
  • 6 Days Ago

  • Storm4 Pflugerville, TX
  • 📍 Location: Pflugerville, Texas 🏗️ Industry: Electrical & Building Systems Engineering About the Opportunity: We’re representing a leading national desig... more
  • 29 Days Ago

AI Assistant is available now!

Feel free to start your new journey!