What are the responsibilities and job description for the HPC Storage Systems Administrator position at Hunter by HiringAgents.ai?
Job title: High-Performance Computing (HPC) Storage Systems Administrator Client: Hunter Scouts Location: Argonne, Illinois, United States - Remote Contract type: Contract, 40 hours/week Contract duration: Not specified Salary:
About The Role
Hunter Scouts is recruiting for a DAOS System Administrator on behalf of a confidential national laboratory. This is a 100% remote contract opportunity to support and maintain large-scale high-performance computing (HPC) storage systems. You will work on advanced distributed storage (DAOS and analogous platforms), ensure system reliability and performance, diagnose and remediate incidents, automate operational tasks, and coordinate with vendors for escalations and upgrades.
Responsibilities
About The Role
Hunter Scouts is recruiting for a DAOS System Administrator on behalf of a confidential national laboratory. This is a 100% remote contract opportunity to support and maintain large-scale high-performance computing (HPC) storage systems. You will work on advanced distributed storage (DAOS and analogous platforms), ensure system reliability and performance, diagnose and remediate incidents, automate operational tasks, and coordinate with vendors for escalations and upgrades.
Responsibilities
- Provide daily operations, maintenance, and support for HPE DAOS and related HPC storage systems
- Monitor system health, performance, and capacity and implement proactive remediation
- Diagnose incidents, perform root-cause analysis, and execute corrective actions
- Automate routine operational tasks using scripting and tooling
- Coordinate with internal teams and vendor (e.g., HPE) support for escalations, patches, and upgrades
- Perform routine system maintenance, including updates, configuration changes, and documentation
- Maintain documentation of configurations, procedures, and incident resolutions
- Location: Must reside in the United States and be available to work remotely
- Authorization: Must be authorized to work in the U.S. without current or future employer sponsorship
- Experience: 3 years administering Linux systems in production environments
- Storage: 2 years operating high-performance and distributed storage systems (e.g., DAOS, Lustre, Ceph, GPFS/IBM Spectrum Scale) in large-scale environments
- Scripting & automation: 2 years scripting for automation (e.g., Bash, Python) to manage and operate storage systems
- Hardware & diagnostics: Experience with computer/server hardware installation, troubleshooting, and monitoring/diagnosing storage clusters using logs and diagnostic tools
- Prior production experience administering HPE DAOS
- Experience with HPC clusters (job schedulers, compute nodes, high-speed interconnects such as InfiniBand)
- Experience coordinating with vendor support (e.g., HPE) for escalations and root-cause analysis
- Experience with configuration management/automation tools (e.g., Ansible)
- Familiarity with HPC networking and storage performance concepts (RDMA, NUMA, InfiniBand)