What are the responsibilities and job description for the AI Operations Platform Consultant position at MSR Technology Group?
Duration: 6 Month Contract
Location: Charlotte, NC or Jersey City, NJ
Schedule: Hybrid (3 Days Onsite / Week)
Type: W2 Only
Job Description
We are seeking an experienced AI Operations Platform Consultant to support the deployment, optimization, and operational management of Large Language Models (LLMs) in a production-grade, mission-critical environment. The ideal candidate has strong hands-on experience with Kubernetes, TensorRT-LLM, Triton Inference Server, and MLOps/LLMOps practices at scale. This role is highly technical, performance-driven, and crucial to the stability and availability of AI inference systems supporting enterprise workloads.
Key Responsibilities
- Deploy, manage, and operate containerized AI/LLM services at scale using Kubernetes and OpenShift.
- Configure, tune, and optimize LLMs using TensorRT-LLM and deploy inference services using NVIDIA Triton Inference Server.
- Manage and support end-to-end MLOps/LLMOps pipelines, ensuring reliable and automated model deployment workflows.
- Set up monitoring frameworks for AI inference services, focusing on performance, availability, latency, and throughput.
- Troubleshoot and resolve production issues related to LLM deployment, containerized environments, model performance, and load balancing.
- Operate mission-critical systems following enterprise standards for incident, event, and change management.
- Build and maintain scalable infrastructure supporting high-performance model serving in production.
- Deploy models into microservices architectures, ensuring robust API design and production stability.
- Configure, optimize, and troubleshoot Triton Inference Server deployments for high-throughput, low-latency inference.
- Apply model optimization techniques including quantization, pruning, knowledge distillation, and TensorRT-LLM-based acceleration.
Required Skills & Experience
- Hands-on experience running containerized applications at scale on Kubernetes/OpenShift.
- Strong expertise with LLM deployment, tuning, and optimization.
- Proficiency with TensorRT-LLM and Triton Inference Server in production environments.
- Deep knowledge of MLOps/LLMOps pipelines, CI/CD for model deployment, and automated inference workflows.
- Experience monitoring, load balancing, and optimizing high-performance inference systems.
- Familiarity with enterprise operational practices (incident/change/event management).
- Knowledge of model optimization and performance-enhancement techniques.