What are the responsibilities and job description for the Onsite/Hybrid Role - Senior AI Platform / LLM Infrastructure Engineer - Charlotte, NC position at Quantum World Technologies Inc.?
Role: Senior AI Platform / LLM Infrastructure Engineer
Location: Charlotte, NC (Hybrid)
Duration: Long Term Contract
We are hiring a Senior AI Platform Engineer to build and optimize on-prem LLM inference platforms. The role focuses on high-performance model serving, GPU workloads, and scalable ML infrastructure using modern inference frameworks and Kubernetes.
Must-Have Skills
LLM Inference Frameworks: vLLM, TensorRT-LLM, Triton Inference Server, SGLang
Model Optimization: Continuous Batching, Speculative Decoding, KV Cache / Prefix Caching, FP8 / AWQ / GPTQ
Distributed/Parallel Systems: Tensor Parallelism
Platform & Orchestration: Kubernetes, KServe, OpenShift AI, Helm / Operators
GPU & Performance: CUDA, NCCL, MIG, GPU Orchestration (Run:AI)
Monitoring: Prometheus, Grafana, ML Observability
Programming: Python
GenAI Tools: Arize AI, Claude (CoWork)
Load / performance testing: GuideLLM, Locust