What are the responsibilities and job description for the On-prem Platform Engineer (LLM, Gen AI) position at Ampstek?

Hi ,

Hope you are doing great!

We have the below urgent position with my client. Please reply if you are interested.

Role : On-prem Platform Engineer

Location : Charlotte, NC

Long Term Contract

Must-Have Skills (Mandatory Keywords)

LLM Inference & Optimization

• vLLM, TensorRT-LLM, Triton Inference Server, SGLang

• Inference optimization techniques:

o Continuous batching

o Speculative decoding

o KV cache / Prefix caching

• Model optimization:

o FP8, AWQ, GPTQ

Distributed & GPU Systems

• Tensor parallelism and large model scaling

• CUDA, NCCL, GPU architecture

• GPU partitioning & optimization (MIG)

Kubernetes & ML Serving

• Kubernetes-based ML serving platforms

• KServe, OpenShift AI

• Helm charts, Operators, platform automation

GPU Orchestration

• Run:AI or similar GPU scheduling/orchestration platforms

• Multi-tenant GPU workload management

Platform Engineering

• Experience building internal AI/ML platforms (on-prem or hybrid)

• Strong automation and system design mindset

Observability & Performance

• Prometheus, Grafana

• ML observability (model latency, throughput, drift, resource utilization)

• Performance benchmarking and tuning

Good to Have / Preferred Skills

• Experience with LLMOps / GenAI pipelines

• Exposure to hybrid cloud (on-prem GCP/Azure integration)

• Familiarity with Inferentia / alternative accelerators

• Knowledge of service mesh / networking in GPU clusters

· Build, configure, and operate on prem Kubernetes/OpenShift AI platforms for deploying and serving GenAI models and LLM inference workloads.

· Design and optimize high performance inference stacks using vLLM, TensorRT LLM, Triton Inference Server, SGLang, and advanced techniques (continuous batching, speculative decoding, KV caching).

· Manage GPU orchestration and capacity using Run:AI, MIG, CUDA/NCCL, and tensor parallelism to maximize utilization and throughput.

· Deploy and operate Kubernetes ML serving frameworks (KServe, Helm, Operators) for scalable, reliable model serving.

· Drive inference optimization and benchmarking, leveraging FP8, AWQ, GPTQ, and performance tools such as GuideLLM and Locust.

· Implement observability and ML monitoring using Prometheus, Grafana, Arize AI, ensuring SLA/SLO compliance for GenAI services.

· Collaborate with ML and research teams to onboard new models, tune inference performance, and productionize GenAI use cases.

Thanks and Regards

Rohit Pathak| Technical Recruiter

rohit.k@ampstek.com| www.ampstek.com

Direct No: 1 609-527-8934

Apply for this job

Receive alerts for other On-prem Platform Engineer (LLM, Gen AI) job openings

On-prem Platform Engineer (LLM, Gen AI)

What are the responsibilities and job description for the On-prem Platform Engineer (LLM, Gen AI) position at Ampstek?

What is the career path for a On-prem Platform Engineer (LLM, Gen AI)?

Job openings at Ampstek

Not the job you're looking for? Here are some other On-prem Platform Engineer (LLM, Gen AI) jobs in the Charlotte, NC area that may be a better fit.

We don't have any other On-prem Platform Engineer (LLM, Gen AI) jobs in the Charlotte, NC area right now.

AI Assistant is available now!