Demo

AI Operations & Infrastructure Engineer

Invictus International Consulting, LLC
Fort Meade, MD Full Time
POSTED ON 6/24/2026
AVAILABLE BEFORE 6/22/2031

Title: AI Operations & Infrastructure Engineer

Location: Fort Meade, MD

Clearance: TS/SCI with a CI Polygraph

 

Job Details:

  • Manage and maintain AI computing platforms, including GPUs and other specialized hardware
  • Install and configure GPU drivers and software
  • Oversee the AI software stack and tools
  • Implement and manage containerization technologies like Docker and Kubernetes
  • Configure and optimize networking infrastructure for AI workloads, including InfiniBand and Ethernet
  • Manage storage solutions for AI data, considering performance and capacity requirements
  • Deploy and manage data processing units (DPUs) to accelerate data center workloads
  • Monitor and manage AI cluster health and resource utilization
  • Implement workload management and scheduling tools like Slurm and Kubernetes
  • Ensure efficient power and cooling for AI infrastructure to maintain optimal operating conditions
  • Configure high-performance networking solutions for AI and machine learning workloads
  • Optimize network performance to ensure maximum throughput and minimal latency for AI computations
  • Implement and fine-tune network protocols to enhance data transfer speeds and efficiency
  • Integrate NVIDIA networking products with existing AI infrastructure, including servers, GPUs, and storage systems
  • Deploy networking solutions in data centers to ensure seamless connectivity between AI components
  • Diagnose and resolve networking issues impacting AI workloads to maintain optimal system performance
  • Provide technical support and guidance to teams managing AI infrastructure
  • Collaborate with data scientists, researchers, and IT professionals to understand networking requirements and challenges
  • Lead deployment and validation of servers and systems for AI enabled platforms
  • Configure and manage network topologies, BMC, OOB, TPM, power, and cooling
  • Install, upgrade, and validate GPU-based servers, BlueField DPUs, cables, and transceivers
  • Perform firmware upgrades, hardware validation, and storage setup
  • Configure and administer physical and logical resources, including M IG partitioning and BlueField platforms
  • Install and configure operating systems, cluster software, drivers, containers (Docker), and NGC CLI
  • Manage and orchestrate clusters using NVIDIA Base Command Manager, Slurm, Pyxis, Enroot, and Run: Ai
  • Perform stress, benchmarking, and burn-in tests using HPL, NCCL, NVIDIA Nemo, and ClusterKit
  • Verify cabling, firmware/software versions, and network signal quality
  • Troubleshoot and resolve hardware, software, storage, and performance faults
  • Replace faulty components and optimize systems for AMD/Intel platforms
  • Monitor, document, and report on cluster health, resource usage, and job performance
  • Ensure secure, efficient, and scalable operation of NVIDIA AI infrastructure, including user access and workload management

 

Requirements:

  • Qualified candidates must hold an active NVIDIA Professional Certification in either AI Networking, AI Infrastructure, or AI Operations
  • Prior direct, hands-on professional experience administering NVIDIA GPU and data processing unit (DPU) technologies, AI software stacks, and data center environments for high-performance AI workloads
  • Comprehensive expertise in deploying and maintaining AI compute platforms, requiring proficiency in containerization and workload orchestration using Docker, Kubernetes, Slurm, NVIDIA Base Command Manager, and Run:Ai
  • Must be capable of configuring physical and logical resources, including Multi-Instance GPU (MIG) partitioning and BlueField platforms, while overseeing critical facility elements such as power, cooling, and storage solutions
  • The ability to demonstrate advanced skills in AI networking, specifically configuring and optimizing high-performance InfiniBand and Ethernet fabrics to ensure maximum throughput and minimal latency
  • Current active TS/SCI clearance with a CI Polygraph

Equal Opportunity Employer/Veterans/Disabled

Salary.com Estimation for AI Operations & Infrastructure Engineer in Fort Meade, MD
$112,602 to $145,008
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a AI Operations & Infrastructure Engineer?

Sign up to receive alerts about other jobs on the AI Operations & Infrastructure Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$103,114 - $138,258
Income Estimation: 
$118,163 - $145,996
Income Estimation: 
$120,777 - $151,022
Income Estimation: 
$129,363 - $167,316
Income Estimation: 
$86,891 - $130,303
Income Estimation: 
$101,387 - $124,118
Income Estimation: 
$119,030 - $151,900
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Invictus International Consulting, LLC

  • Invictus International Consulting, LLC Washington, DC
  • Title: Network Administrator III (Swing Shift) Location: Washington, DC Clearance: TS/SCI with the ability to obtain and maintain a CI polygraph Job Detail... more
  • Just Posted

  • Invictus International Consulting, LLC Quantico, VA
  • Title: Technical Editor Location: Quantico, VA Clearance: TS/SCI CI Poly Responsibilities: Reviews intelligence products for proper grammar, punctuation, a... more
  • 1 Day Ago

  • Invictus International Consulting, LLC Alexandria, VA
  • Title: Subcontracts Administrator Location: Alexandria, VA US Citizenship: Required Clearance: Ability to obtain and maintain a Secret or higher security c... more
  • 1 Day Ago

  • Invictus International Consulting, LLC Alexandria, VA
  • Title: Logistics Asset Manager IV Location: Alexandria, VA Clearance: TS/SCI with the ability to obtain and maintain a CI polygraph Job Details: Reporting ... more
  • 7 Days Ago


Not the job you're looking for? Here are some other AI Operations & Infrastructure Engineer jobs in the Fort Meade, MD area that may be a better fit.

  • Staffed4U Annapolis, MD
  • Software Engineer II – AI Infrastructure Location: Annapolis Junction, MD Work Schedule: Full-Time, Onsite Clearance Required: Active TS/SCI with Full Scop... more
  • Just Posted

  • Bytoa Columbia, MD
  • Description: We are seeking a highly experienced and driven Sr. Software Engineer to join a full stack LLM integration and delivery team. Ideally, you'll h... more
  • 16 Days Ago

AI Assistant is available now!

Feel free to start your new journey!