What are the responsibilities and job description for the HPC Systems Engineer position at Jobs via Dice?
Job ID: 2610670
Location: Charlottesville, VA, US
Date Posted: 2026-03-26
Category: Engineering and Sciences
Subcategory: Systems Engineer
Schedule: Full-Time
Shift: Day Job
Travel: No
Minimum Clearance Required: Top_Secret
Clearance Level Must Be Able to Obtain: TS/SCI
Potential for Remote Work: ORA_ON_SITE
Description
SAIC is looking for a highly qualified HPC Systems Engineer to support the Army's Golden Dome initiative. The engineer will support the deployment and sustainment of Linux-based High Performance Computing (HPC) cluster environments used for distributed compute workloads, simulation environments, and GPU-enabled processing.
The environment will include:
Candidates should be comfortable working within cluster-scale computing environments where performance, scheduler configuration, and distributed workload execution are critical operational factors.
The HPC Systems Engineer will support the build-out, configuration, and sustainment of HPC cluster platforms.
The role focuses on:
Core Technical Capabilities
Candidates should demonstrate capability in most of the following areas.
HPC Cluster Platforms
Experience supporting multi-node Linux compute clusters, including node integration, configuration, and operational sustainment.
Experience with cluster provisioning tools such as xCAT, Warewulf, or similar node deployment systems is beneficial.
Workload Scheduling Platforms
Experience supporting distributed compute workloads using schedulers such as:
Candidates should understand how workload schedulers interact with distributed compute workloads and containerized execution environments.
Linux Systems Administration
Strong Linux administration experience including:
Distributed and Containerized Workloads
Experience supporting distributed compute workloads utilizing parallel computing frameworks such as:
Familiarity with container technologies commonly used in HPC environments such as:
Experience supporting containerized HPC workloads or integrating container platforms with cluster infrastructure is desirable.
HPC Networking
Familiarity with high-performance networking technologies including:
GPU Compute Environments
Experience supporting GPU-enabled compute environments and workloads utilizing CUDA frameworks is desirable.
Automation and Operational Tooling
Experience writing scripts or operational tooling using languages such as:
Qualifications
Candidates must meet the following requirements:
Location: Charlottesville, VA, US
Date Posted: 2026-03-26
Category: Engineering and Sciences
Subcategory: Systems Engineer
Schedule: Full-Time
Shift: Day Job
Travel: No
Minimum Clearance Required: Top_Secret
Clearance Level Must Be Able to Obtain: TS/SCI
Potential for Remote Work: ORA_ON_SITE
Description
SAIC is looking for a highly qualified HPC Systems Engineer to support the Army's Golden Dome initiative. The engineer will support the deployment and sustainment of Linux-based High Performance Computing (HPC) cluster environments used for distributed compute workloads, simulation environments, and GPU-enabled processing.
The environment will include:
- multi-node Linux compute clusters
- workload scheduling platforms such as Slurm or PBS
- cluster provisioning frameworks (e.g., xCAT, Warewulf)
- high-performance networking technologies including RDMA / InfiniBand
- distributed parallel compute workloads utilizing MPI or OpenMP
- GPU-enabled compute resources supporting CUDA-based processing
Candidates should be comfortable working within cluster-scale computing environments where performance, scheduler configuration, and distributed workload execution are critical operational factors.
The HPC Systems Engineer will support the build-out, configuration, and sustainment of HPC cluster platforms.
The role focuses on:
- cluster platform configuration
- scheduler administration
- distributed compute troubleshooting
- performance analysis across compute, storage, and network layers
- GPU compute workload support
- automation and operational tooling
Core Technical Capabilities
Candidates should demonstrate capability in most of the following areas.
HPC Cluster Platforms
Experience supporting multi-node Linux compute clusters, including node integration, configuration, and operational sustainment.
Experience with cluster provisioning tools such as xCAT, Warewulf, or similar node deployment systems is beneficial.
Workload Scheduling Platforms
Experience supporting distributed compute workloads using schedulers such as:
- Slurm
- PBS / PBS Pro
- Torque
- Grid Engine
Candidates should understand how workload schedulers interact with distributed compute workloads and containerized execution environments.
Linux Systems Administration
Strong Linux administration experience including:
- command-line system administration
- server and compute node configuration
- system troubleshooting in distributed compute environments
Distributed and Containerized Workloads
Experience supporting distributed compute workloads utilizing parallel computing frameworks such as:
- MPI
- OpenMP
- GPU compute frameworks
Familiarity with container technologies commonly used in HPC environments such as:
- Docker
- Podman
- Singularity / Apptainer
Experience supporting containerized HPC workloads or integrating container platforms with cluster infrastructure is desirable.
HPC Networking
Familiarity with high-performance networking technologies including:
- RDMA networking
- InfiniBand
- high-throughput cluster networking architectures
GPU Compute Environments
Experience supporting GPU-enabled compute environments and workloads utilizing CUDA frameworks is desirable.
Automation and Operational Tooling
Experience writing scripts or operational tooling using languages such as:
- Bash
- Python
Qualifications
Candidates must meet the following requirements:
- Bachelor degree in science/technology; 10 additional YoE can be substituted for degree
- 8 years of experience is required
- Minimum 6 years of experience administering Linux systems in enterprise, research computing, or distributed compute environments
- An Active Top Secret clearance is required; an active TS/SCI clearance must be obtained prior to beginning work.
- 100% onsite support in Charlottesville, VA
- Experience supporting distributed compute environments or HPC cluster platforms
- Experience working with workload schedulers such as Slurm, PBS, Torque, or similar systems
- Experience administering Linux systems through command-line interfaces
- Experience with scripting or automation tools (Bash, Python, or similar)
- Ability to obtain required DoD 8140 (8570) IAT Level II certification
- Candidates must have direct experience with HPC or distributed compute environments.
- Administration of multi-node HPC cluster environments
- Experience with parallel or distributed file systems such as Lustre, BeeGFS, or GPFS
- Experience supporting GPU-enabled compute environments and CUDA workloads
- Experience with configuration management tools such as Ansible or Puppet
- Experience supporting research, laboratory, or mission computing environments
- Experience supporting systems within DoD/DoW or IC environments