What are the responsibilities and job description for the Senior MLOps / LLMOps Engineer position at ITCAPS LLC?
Job Title - Senior MLOps / LLMOps Engineer Kubernetes & AI Inference Platforms
Duration - 2 Months
Location: New Jersey
Job Summary
We are seeking a highly skilled Senior MLOps / LLMOps Engineer to design, deploy, and support enterprise-scale AI/LLM platforms in production environments. The ideal candidate will have strong experience with Kubernetes/OpenShift, NVIDIA TensorRT-LLM, Triton Inference Server, and scalable AI infrastructure. This role focuses on building reliable, secure, and high-performance inference platforms for mission-critical AI applications.
Key Responsibilities
- Deploy, manage, and troubleshoot containerized AI/LLM applications on Kubernetes/OpenShift platforms.
- Configure, optimize, and support LLM inference workloads using NVIDIA TensorRT-LLM and Triton Inference Server.
- Design and maintain scalable MLOps/LLMOps and container deployment pipelines.
- Build CI/CD workflows for AI models, containers, and infrastructure deployments.
- Package and deploy AI models across UAT, testing, and production environments.
- Monitor platform performance, GPU utilization, availability, and operational health.
- Implement logging, alerting, monitoring, and automated operational support processes.
- Troubleshoot model deployment, scaling, networking, and load balancing issues.
- Support model optimization techniques including quantization, pruning, and performance tuning.
- Create operational runbooks, deployment procedures, health checks, and support documentation.
- Support backup, restore, disaster recovery, failover, and business continuity planning.
- Ensure platform security, RBAC, compliance, and governance standards are maintained.
- Collaborate with AI, infrastructure, DevOps, and operations teams to deliver scalable AI solutions.
Required Qualifications
- 5 years of experience in Kubernetes/OpenShift administration and containerized environments.
- Strong hands-on experience with NVIDIA TensorRT-LLM and Triton Inference Server.
- Experience deploying and supporting LLM/AI inference services in production.
- Strong knowledge of Docker, microservices, and API-based architectures.
- Experience building and supporting MLOps/LLMOps pipelines and CI/CD workflows.
- Expertise in monitoring, logging, and troubleshooting distributed systems.
- Experience with NVIDIA GPU infrastructure and AI workload optimization.
- Understanding of incident management, change management, and operational best practices.
- Strong problem-solving, communication, and collaboration skills.
Preferred Qualifications
- Experience with OpenShift AI and enterprise AI platforms.
- Knowledge of model optimization and inference acceleration techniques.
- Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform.
- Familiarity with Infrastructure as Code (Terraform, Ansible, Helm, etc.).
- Kubernetes/OpenShift or cloud certifications are a plus.
Salary : $70 - $80