What are the responsibilities and job description for the HPC Production Engineer position at NJF Global Holdings Ltd?
HPC Production Engineer – Deep Systems & Performance
I’m currently partnered with a leading global trading firm looking for a highly specialized HPC Production Engineer to support their large-scale, mission-critical compute and storage environments. This role is ideal for engineers who thrive on deep technical challenges, working with high-performance systems that power complex workloads.
This is not a generalist IT role—we’re looking for someone who can dig into HPC internals, troubleshoot complex production issues, and optimize systems across compute, storage, and network layers.
Responsibilities
- Design, implement, and maintain large-scale HPC compute and storage infrastructure.
- Monitor and optimize system performance, diagnose complex production incidents, and perform root cause analysis.
- Build and maintain tooling for software deployment, OS upgrades, and cluster provisioning.
- Implement and maintain performance and fault monitoring systems.
- Collaborate with researchers and engineers to analyze workloads and optimize HPC performance.
- Provide operational support on a rotating on-call schedule.
Required Experience
- 5 years of hands-on HPC production experience, including parallel filesystems (Lustre, GPFS, or similar) and batch schedulers (Slurm, Grid Engine, or equivalent).
- Strong Linux systems administration skills, including kernel, memory, I/O, and process-level debugging.
- Programming/scripting proficiency in Go, Python, C/C , or similar.
- Experience designing, building, and operating complex distributed systems.
- Familiarity with configuration management and automation tools (Ansible, SaltStack, Puppet, etc.).
Ideal Candidate
- Can explain HPC internals in depth, including system behavior under load.
- Has experience troubleshooting production HPC issues end-to-end.
- Comfortable working across compute, storage, and network layers.
- Hands-on, collaborative, and capable of operating high-performance infrastructure at scale.
This is an excellent opportunity for HPC engineers who enjoy working at the forefront of large-scale computing environments, solving challenging technical problems, and supporting mission-critical systems.
Salary : $200,000 - $300,000