Demo

Hybrid || LLM Inference & GPU Systems Consultant || Charlotte, NC

Technogen, Inc.
Charlotte, NC Contractor
POSTED ON 5/25/2026
AVAILABLE BEFORE 6/24/2026

TECHNOGEN, Inc. is a Proven Leader in providing full IT Services, Software Development and Solutions for 15 years.

TECHNOGEN is a Small & Woman Owned Minority Business with GSA Advantage Certification. We have offices in VA; MD & Offshore development centers in India. We have successfully executed 100 projects for clients ranging from small business and non-profits to Fortune 50 companies and federal, state and local agencies.


Description:
Local candidates preferred.

Role Overview:
We are seeking an AI Infrastructure Runtime Engineer to build and maintain large-scale on-prem LLM infrastructure. This is an enterprise private GenAI environment running on NVIDIA H200 GPU clusters and an OpenShift AI deployment ecosystem. You will manage production inference internally, including self-hosting open-source LLMs like Llama. We are focused exclusively on inferencing; this role involves no model training infrastructure or fine-tuning pipelines.

Key Responsibilities
NVIDIA GPU Runtime Optimization: Drive extreme runtime efficiency and optimization for the token generation pipeline. Specifically manage prefill/decode optimization and KV cache management.
Inference Serving: Deploy and manage inference engines including vLLM and TensorRT-LLM.
Hardware Utilization: Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration using RunAI and Kubernetes GPU orchestration.
Model Lifecycle Management: Oversee the complete Hugging Face model lifecycle, including model onboarding, deployment, and retirement.
Platform Operations: Operate and maintain the OpenShift AI ecosystem as the primary container platform for GenAI workloads.

Required Qualifications
8 years experience working as an LLM Systems Engineer or AI Infrastructure Runtime Engineer.
8 years hands-on experience with NVIDIA H200 clusters and runtime optimization techniques (KV Cache, prefill/decode).
Proficiency in OpenShift AI and GPU orchestration tools like RunAI.
Strong experience with modern inference frameworks, specifically vLLM and TensorRT-LLM.
Proven track record managing the Hugging Face deployment lifecycle.
Must be onsite at client in Charlotte, NC at least 3 days/week

Hourly Wage Estimation for Hybrid || LLM Inference & GPU Systems Consultant || Charlotte, NC in Charlotte, NC
$68.00 to $88.00
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Hybrid || LLM Inference & GPU Systems Consultant || Charlotte, NC?

Sign up to receive alerts about other jobs on the Hybrid || LLM Inference & GPU Systems Consultant || Charlotte, NC career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$149,493 - $192,976
Income Estimation: 
$184,796 - $233,226
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Technogen, Inc.

  • Technogen, Inc. Edison, NJ
  • TECHNOGEN, Inc. is a Proven Leader in providing full IT Services, Software Development and Solutions for 15 years. TECHNOGEN is a Small & Woman Owned Minor... more
  • 1 Day Ago

  • Technogen, Inc. Crownsville, MD
  • Job Title: VMware Infrastructure Engineer Location: Crownsville, MD - Hybrid in Maryland (Tue/Thu) While familiarity with the skills listed is important, t... more
  • 2 Days Ago

  • Technogen, Inc. Newark, NJ
  • TECHNOGEN, Inc. is a Proven Leader in providing full IT Services, Software Development and Solutions for 15 years. TECHNOGEN is a Small & Woman Owned Minor... more
  • 2 Days Ago

  • Technogen, Inc. Camden, NJ
  • TECHNOGEN, Inc. is a Proven Leader in providing full IT Services, Software Development and Solutions for 15 years. TECHNOGEN is a Small & Woman Owned Minor... more
  • 2 Days Ago


Not the job you're looking for? Here are some other Hybrid || LLM Inference & GPU Systems Consultant || Charlotte, NC jobs in the Charlotte, NC area that may be a better fit.

  • ARK Infotech Spectrum Charlotte, NC
  • Role : LLM Inference & GPU Systems Consultant Location : Charlotte , NC ( Locals only) We are seeking an AI Infrastructure Runtime Engineer to build and ma... more
  • 6 Days Ago

  • Brighthouse Financial Charlotte, NC
  • Where You’ll Work Our flexible, hybrid work model offers the option to work remotely or in the office. How You’ll Contribute As a Tax Consultant, you’ll be... more
  • 4 Days Ago

AI Assistant is available now!

Feel free to start your new journey!