What are the responsibilities and job description for the LLM Inference & GPU Systems Consultant position at ARK Infotech Spectrum?

Role : LLM Inference & GPU Systems Consultant

Location : Charlotte , NC ( Locals only)

We are seeking an AI Infrastructure Runtime Engineer to build and maintain large-scale on-prem LLM infrastructure. This is an enterprise private GenAI environment running on NVIDIA H200 GPU clusters and an OpenShift AI deployment ecosystem. You will manage production inference internally, including self-hosting open-source LLMs like Llama. We are focused exclusively on inferencing; this role involves no model training infrastructure or fine-tuning pipelines.

Key Responsibilities
NVIDIA GPU Runtime Optimization: Drive extreme runtime efficiency and optimization for the token generation pipeline. Specifically manage prefill/decode optimization and KV cache management.
Inference Serving: Deploy and manage inference engines including vLLM and TensorRT-LLM.
Hardware Utilization: Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration using RunAI and Kubernetes GPU orchestration.
Model Lifecycle Management: Oversee the complete Hugging Face model lifecycle, including model onboarding, deployment, and retirement.
Platform Operations: Operate and maintain the OpenShift AI ecosystem as the primary container platform for GenAI workloads.

Required Qualifications
8 years experience working as an LLM Systems Engineer or AI Infrastructure Runtime Engineer.
8 years hands-on experience with NVIDIA H200 clusters and runtime optimization techniques (KV Cache, prefill/decode).
Proficiency in OpenShift AI and GPU orchestration tools like RunAI.
Strong experience with modern inference frameworks, specifically vLLM and TensorRT-LLM.
Proven track record managing the Hugging Face deployment lifecycle.
Must be onsite at client in Charlotte, NC at least 3 days/week

Apply for this job

Receive alerts for other LLM Inference & GPU Systems Consultant job openings

LLM Inference & GPU Systems Consultant

What are the responsibilities and job description for the LLM Inference & GPU Systems Consultant position at ARK Infotech Spectrum?

What is the career path for a LLM Inference & GPU Systems Consultant?

Job openings at ARK Infotech Spectrum

Not the job you're looking for? Here are some other LLM Inference & GPU Systems Consultant jobs in the Charlotte, NC area that may be a better fit.

We don't have any other LLM Inference & GPU Systems Consultant jobs in the Charlotte, NC area right now.

AI Assistant is available now!