Demo

Senior HPC DevOps Engineer

Peraton
Peraton Salary
College Park, MD Full Time
POSTED ON 5/5/2026
AVAILABLE BEFORE 5/4/2027

Responsibilities

Peraton Labs is seeking a poly cleared Senior HPC DevOps Engineer to own the operations and automation lifecycle for an existing HPC/AI compute cluster (Linux). You will work closely with Peraton team members, as well as directly with our Maryland-based customer, in a fast-paced environment at a customer site. In this role you will codify repeatable operations in Ansible and drive execution through an enterprise automation controller to enforce desired state, detect drift, accelerate node onboarding, and streamline incident response via runbook automation integrated with monitoring and ITSM.

 

This position requires full-time on-site work at a customer site near College Park, MD.

 

Key responsibilities may include

  • Automation ownership: Own and manage automation workflows, including job templates, inventories, credentials, RBAC configurations, execution environments, and promotion across environments.
  • Desired-state and drift detection: Enforce desired state across cluster services via code-driven configuration; implement drift detection and alert on deviations; reconcile runtime state vs configured state.
  • Compute node onboarding (Bare-metal/VM): Build and maintain an automated node bootstrap workflow that installs/configures the OS, applies security and performance baselines, enrolls nodes into the scheduler and shared storage ecosystem, validates hardware and service readiness (CPU, network, accelerator, storage mounts), and reports pass/fail results.
  • Patch & vulnerability response: Implement rolling maintenance and patch automation to meet defined vulnerability response SLAs. Maintain version-controlled container build definitions and integrate image scanning into the build/release lifecycle.
  • Logging & observability: Ensure automation and operational workflows emit auditable logs to centralized analytics and integrate with metrics/alerting to enable reliable incident response, proactive detection, and safe auto-remediation.
  • Incident/problem management: Automate responses to common incidents (hung nodes, storage performance alarms, image vulnerabilities, hardware failures) leveraging out-of-band hardware management interfaces and standardized runbooks.
  • Docs-as-code: Keep runbooks and operational documentation versioned alongside automation and publish operator guidance to the orgs documentation platform.

Qualifications

Required qualifications

  • 12 years of experience and a BS in computer science, IT, or related technical field, MS and 10 years of experience, or a Ph.D. with 8 years of experience. Four years of additional experience is required in lieu of a Bachelors’ degree for a total of 16 years of experience.
  • 7 years in Linux systems / SRE / DevOps, including production cluster operations in an HPC or large-scale compute environment.
  • 3 years of experience building and operating Ansible automation at scale (roles/collections, idempotency, inventories, secrets).
  • Strong Linux hardening & compliance fundamentals (SELinux/AppArmor, SSH key automation, baseline config management).
  • Demonstrated experience operating or automating clustered compute environments (HPC, large Linux farms, or similar).
  • Hands-on experience with container tooling in Linux environments, including image lifecycle/versioning.
  • Familiarity with incident response and runbook-driven operations; ability to automate common remediations.
  • Strong Git workflow and documentation practices.
  • Must hold at least one active/current technical certification from the following-
    • Systems engineering (e.g., INCOSE)
    • Information security (e.g., CISSP)
    • Networking (e.g., CCNA)
    • System Administration (e.g., RHCE, MCSE)
    • Virtualization (e.g., VCP)
    • IT systems management (e.g., ITIL)
    • Project management (e.g., PMP, Agile)
  • Active TS/SCI security clearance with a current polygraph is required

 

Preferred qualifications

  • Bare-metal provisioning experience (PXE/iPXE, Kickstart/Preseed, Foreman/MAAS) and hardware OOB management.
  • CI/testing for automation and promotion pipelines for playbooks
  • Experience with tuned performance profiles, HPC performance troubleshooting, and GPU node health validation.

 

#MDPM

Peraton Overview

Peraton is a next-generation national security company that drives missions of consequence spanning the globe and extending to the farthest reaches of the galaxy. As the world’s leading mission capability integrator and transformative enterprise IT provider, we deliver trusted, highly differentiated solutions and technologies to protect our nation and allies. Peraton operates at the critical nexus between traditional and nontraditional threats across all domains: land, sea, space, air, and cyberspace. The company serves as a valued partner to essential government agencies and supports every branch of the U.S. armed forces. Each day, our employees do the can’t be done by solving the most daunting challenges facing our customers. Visit peraton.com to learn how we’re keeping people around the world safe and secure.

Target Salary Range

$146,000 - $234,000. This represents the typical salary range for this position. Salary is determined by various factors, including but not limited to, the scope and responsibilities of the position, the individual’s experience, education, knowledge, skills, and competencies, as well as geographic location and business and contract considerations. Depending on the position, employees may be eligible for overtime, shift differential, and a discretionary bonus in addition to base pay.

EEO

EEO: Equal opportunity employer, including disability and protected veterans, or other characteristics protected by law.

Salary : $146,000 - $234,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Peraton

  • Peraton Fort Meade, MD
  • Responsibilities We are seeking a Senior System Administrator to provide enterprise-focused engineering support to a Government Customer. You will provide ... more
  • 3 Days Ago

  • Peraton Woodlawn, MD
  • Responsibilities Peraton is seeking an Application Architect Lead to join our team of qualified, diverse individuals. The ideal candidate will manage all a... more
  • 3 Days Ago

  • Peraton Woodlawn, MD
  • Responsibilities Peraton is seeking an Senior Application Development Lead to join our team of qualified, diverse individuals. The ideal candidate will lea... more
  • 3 Days Ago

  • Peraton Linthicum, MD
  • Responsibilities The Task Lead will serve as a project lead and provide oversight of system and network forensic examinations/malware analysis and reverse ... more
  • 3 Days Ago


Not the job you're looking for? Here are some other Senior HPC DevOps Engineer jobs in the College Park, MD area that may be a better fit.

  • Peraton Labs College Park, MD
  • About Peraton Peraton is a next-generation national security company that drives missions of consequence spanning the globe and extending to the farthest r... more
  • 5 Days Ago

  • DeepSig, Inc. Arlington, VA
  • Job Type Full-time Description Type: Full-Time(W2) On-site/Hybrid, Arlington, VA (Remote option available for the right candidate) DeepSig is defining the ... more
  • 29 Days Ago

AI Assistant is available now!

Feel free to start your new journey!