Demo

LLM Inference & GPU Systems Consultant

ARK Infotech Spectrum
Charlotte, NC Contractor
POSTED ON 5/19/2026
AVAILABLE BEFORE 6/18/2026

Role : LLM Inference & GPU Systems Consultant

Location  : Charlotte , NC ( Locals only)

 

We are seeking an AI Infrastructure Runtime Engineer to build and maintain large-scale on-prem LLM infrastructure. This is an enterprise private GenAI environment running on NVIDIA H200 GPU clusters and an OpenShift AI deployment ecosystem. You will manage production inference internally, including self-hosting open-source LLMs like Llama. We are focused exclusively on inferencing; this role involves no model training infrastructure or fine-tuning pipelines.

Key Responsibilities
NVIDIA GPU Runtime Optimization: Drive extreme runtime efficiency and optimization for the token generation pipeline. Specifically manage prefill/decode optimization and KV cache management.
Inference Serving: Deploy and manage inference engines including vLLM and TensorRT-LLM.
Hardware Utilization: Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration using RunAI and Kubernetes GPU orchestration.
Model Lifecycle Management: Oversee the complete Hugging Face model lifecycle, including model onboarding, deployment, and retirement.
Platform Operations: Operate and maintain the OpenShift AI ecosystem as the primary container platform for GenAI workloads.

Required Qualifications
8 years experience working as an LLM Systems Engineer or AI Infrastructure Runtime Engineer.
8 years hands-on experience with NVIDIA H200 clusters and runtime optimization techniques (KV Cache, prefill/decode).
Proficiency in OpenShift AI and GPU orchestration tools like RunAI.
Strong experience with modern inference frameworks, specifically vLLM and TensorRT-LLM.
Proven track record managing the Hugging Face deployment lifecycle.
Must be onsite at client in Charlotte, NC at least 3 days/week

Hourly Wage Estimation for LLM Inference & GPU Systems Consultant in Charlotte, NC
$64.00 to $80.00
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a LLM Inference & GPU Systems Consultant?

Sign up to receive alerts about other jobs on the LLM Inference & GPU Systems Consultant career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$149,493 - $192,976
Income Estimation: 
$184,796 - $233,226
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at ARK Infotech Spectrum

  • ARK Infotech Spectrum Jersey, NJ
  • Role: Infrastructure Operations Consultant - Core Banking Location: Jersey City, NJ”. [100% onsite] Language Requirement: English proficiency required Role... more
  • 1 Day Ago

  • ARK Infotech Spectrum York, NY
  • Role : C/C Developer Location : NYC , NY (Onsite) Local candidates only as the interview may be inperson 10 years of experience · C/C Development Experienc... more
  • 2 Days Ago

  • ARK Infotech Spectrum Coppell, TX
  • Role : Sr. AI Engineer Location : Coppel, TX (onsite) Must have: LangGraph (graphs, tool-nodes, memory/state, streaming agents), IBM Watsonx Orchestrate (s... more
  • 2 Days Ago

  • ARK Infotech Spectrum Atlanta, GA
  • Role: Android Developer Location:Atlanta,GA(Onsite) Exp:Min 12 (Years) Job Description: Software Development Engineer (mobile perf testing experience a PLU... more
  • 3 Days Ago


Not the job you're looking for? Here are some other LLM Inference & GPU Systems Consultant jobs in the Charlotte, NC area that may be a better fit.

  • Technogen, Inc. Charlotte, NC
  • TECHNOGEN, Inc. is a Proven Leader in providing full IT Services, Software Development and Solutions for 15 years. TECHNOGEN is a Small & Woman Owned Minor... more
  • 1 Day Ago

  • Wells Fargo Charlotte, NC
  • About This Role Wells Fargo is seeking a Generative AI Senior Software Engineer for Cloud and LLM API Systems within Digital Technology - AI Capability Eng... more
  • 21 Days Ago

AI Assistant is available now!

Feel free to start your new journey!