What are the responsibilities and job description for the On-Prem Cloud Engineer position at Success In Cloud, Inc.?
Title: Cloud Engineer
Location: Brevard, Charlotte
Experience: 5 to 8 yrs
W2, C2C
Must Have: Arize AI, Claude Cowork, GCP, Terraform
Technical Skilled Required:
VLLM, TensorRT-LLM-Triton Inference Server, SGLang, Inference, Optimization, Continuous Batching, Speculative Decoding KV, Cache / Prefix Caching, FP8 /AWQ/GPTQ, Tensor, Parallelism, Kubernetes ML Serving, KServe OpenShift Al. Helm /Operators, GPU, Orchestration, Run:AI., Performance, Benchmarking, CUDA/NCCL/MIG, Prometheus /Grafana ML Observability GuideLLM, Locust.
Responsibilities:
- Build, configure, and operate on-prem Kubernetes/OpenShift Al platforms for deploying and serving GenAl models and LLM inference workloads.
- Design and optimize high-performance inference stacks using vLLM, TensorRT-LLM, Triton Inference Server, SGLang, and advanced techniques (continuous batching, speculative decoding, KV caching).
- Manage GPU orchestration and capacity using Run:AI, MIG, CUDA/NCCL, and tensor parallelism to maximize utilization and throughput.
- Deploy and operate Kubernetes ML serving frameworks (KServe, Helm, Operators) for scalable, reliable model serving.
- Drive inference optimization and benchmarking, leveraging FP8, AWQ, GPTQ, and performance tools such as GuideLLM and Locust.
- Implement observability and ML monitoring using Prometheus, Grafana, Arize Al, ensuring SLA/SLO compliance for GenAl services.
- Collaborate with ML and research teams to onboard new models, tune inference performance, and productionize GenAI use cases.