Demo

HPC Systems Administrator

Empire AI
Buffalo, NY Full Time
POSTED ON 5/15/2026
AVAILABLE BEFORE 8/21/2026

About Empire AI

Empire AI is establishing New York as the national leader in responsible artificial intelligence. Backed by a consortium of top academic and research institutions including Columbia University, Cornell University, NYU, CUNY, RPI, SUNY, University of Rochester, RIT, Mount Sinai, and Flatiron Institute.

By leveraging the state's rich academic resources and research institutions, Empire AI is driving innovation in fields like medicine, education, energy, and climate change — all while giving New York's researchers access to computing resources that are often prohibitively expensive and only available to big tech companies, fueling statewide innovation, driving economic growth, and preparing a future-ready AI workforce to tackle society's most complex challenges.

The initiative is funded by $500 million in public and private investments, State Capital Grant, Academic Institutions, Simons Foundation, Flatiron Institute, and Tom Secunda (Co-Founder of Bloomberg).


Position Summary

The HPC Systems Administrator will administer, optimize, and support the high-performance computing platforms that power Empire AI's AI/ML workloads, scientific research, and large-scale simulation across its statewide consortium. Reporting to the Manager, AI/ML Systems Administration, this role is responsible for the day to day cluster operations, job scheduling, GPU resource management, and systems reliability of Empire AI's distributed HPC infrastructure.

This role ensures that Empire AI's shared computing environments remain available, performant, and accessible to researchers across partner institutions. The HPC Systems Administrator works at the intersection of systems administration, AI/ML infrastructure support, and research computing, bridging the gap between complex user workloads and the underlying HPC platform.


Duties and Responsibilities

HPC Cluster Administration

  • Deploy, configure, and maintain Linux-based HPC clusters (Rocky/Ubuntu) at scale, including compute, GPU, storage, and management nodes
  • Administer and optimize Slurm workload manager including partition design, QOS policies, fair-share accounting, and cross-institutional workload orchestration models
  • Manage NVIDIA GPU resources (H100/H200/GB200) including driver, CUDA, firmware, and NCCL lifecycle management for AI training and inference workloads
  • Administer cluster management platforms such as NVIDIA Base Command Manager (BCM) for provisioning and system lifecycle management
  • Support containerized and virtualized research environments using Apptainer/Singularity, Pyxis and Enroot
  • Troubleshoot performance bottlenecks including MPI/NCCL collective traffic patterns and rail optimized topologies for LLM and AI workloads
  • Administer parallel file systems such as Lustre and Vast and integrate with cluster storage workflows
  • Establish incident alerting and escalation procedures for HPC cluster and infrastructure.
  • Manage detailed monitoring dashboards (Prometheus, Grafana) to track critical metrics: network throughput, GPU utilization, cluster health, and job telemetry.


AI/ML Infrastructure Support

  • Architect and support systems for AI training and inference pipelines, including large language models (LLMs) and multimodal AI workloads
  • Tune and benchmark systems for GPU-intensive AI/ML frameworks including PyTorch and TensorFlow
  • Work with research faculty to translate scientific goals into technical configurations and workload requirements
  • Evaluate emerging HPC hardware and software solutions, propose procurement recommendations aligned with AI/ML workload demands


Security & Compliance

  • Enforce security baselines, access control policies, and network segmentation across HPC environments
  • Integrate robust monitoring, alerting, access control, and disaster recovery planning into cluster operations
  • Partner with the Security & Compliance specialist to ensure security is integrated into system design and workload orchestration


Collaboration & Documentation

  • Consult with research teams across consortium institutions to assess computational needs and advise on workflow optimization
  • Translate user feedback and researcher requirements into system-level improvements and configuration optimizations
  • Maintain clear system documentation, configuration guides, runbooks, and architecture diagrams


Minimum Qualifications

  • Bachelor's degree in Computer Science, Engineering, or a related technical field
  • 5 years of hands-on experience administering Linux-based HPC clusters in production environments, supporting research or scientific computing projects
  • Expertise with job schedulers (e.g., Slurm) and GPU computing
  • Familiarity with AI/ML frameworks, container environments (Apptainer/Singularity, Pyxis, Docker), and distributed storage systems
  • Working knowledge of InfiniBand networking (subnet management, UFM, opensm) and/or RoCEv2/Ethernet HPC fabrics
  • Proficiency in Bash and Python scripting for automation and systems administration
  • Experience with monitoring stacks: Prometheus, Grafana, or equivalent
  • Demonstrated success collaborating with researchers or supporting scientific computing projects


Preferred Qualifications

  • Experience with NVIDIA Base Command Manager (BCM), NVIDIA UFM, or DGX SuperPOD infrastructure
  • Familiarity with workload patterns and infrastructure needs for training, tuning, and deploying large-scale AI/ML models
  • Proficiency in infrastructure automation and system configuration tools: Ansible, Git
  • Experience supporting or collaborating within academic or industry research environments focused on artificial intelligence, machine learning, or large-scale data science

Salary.com Estimation for HPC Systems Administrator in Buffalo, NY
$99,230 to $125,216
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a HPC Systems Administrator?

Sign up to receive alerts about other jobs on the HPC Systems Administrator career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$101,597 - $131,824
Income Estimation: 
$104,896 - $133,785
Income Estimation: 
$123,198 - $153,566
Income Estimation: 
$144,577 - $191,047
Income Estimation: 
$178,567 - $236,389
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Not the job you're looking for? Here are some other HPC Systems Administrator jobs in the Buffalo, NY area that may be a better fit.

  • Empire AI Buffalo, NY
  • About Empire AI Empire AI is establishing New York as the national leader in responsible artificial intelligence. Backed by a consortium of top academic an... more
  • 18 Days Ago

  • Calspan and Careers Buffalo, NY
  • Overview: Calspan is seeking a highly skilled Systems Administrator to manage our core infrastructure and High Performance Computing (HPC) environments. Th... more
  • Just Posted

AI Assistant is available now!

Feel free to start your new journey!