What are the responsibilities and job description for the Sr Kubernetes Engineer position at LHi Group?
Senior AI/HPC Infrastructure Engineer (Kubernetes) – Next-Generation Cloud & HPC Platform
I’m searching for a Senior Kubernetes Engineer for a client that is building one of the most advanced GPU-accelerated compute platforms in the industry. They enable high-performance workloads for AI/ML, HPC, and LLM training across hybrid and on-prem environments, working at the forefront of scalable and multi-tenant cloud infrastructure
As a Senior Kubernetes Engineer, you will design, implement, and optimize GPU-enabled container platforms, ensuring high-throughput and reliable operations. Day-to-day, you’ll be architecting and managing Kubernetes clusters, integrating NVIDIA GPUs with device plugins and MIG capabilities, automating infrastructure services with custom operators, and collaborating with HPC, ML, and DevOps teams to maximize GPU utilization and cluster performance. You’ll also contribute to monitoring, observability, CI/CD pipelines, and infrastructure-as-code for large-scale, multi-user environments.
Key experience:
- Production-grade Kubernetes, including GPU scheduling and cluster optimization
- NVIDIA GPUs, device plugins, MIG, and GPU resource management
- Python or Go for custom Kubernetes operators and controllers
- Helm, Kustomize, and GitOps workflows (ArgoCD, FluxCD)
- Experience with GPU-intensive workloads such as AI/ML training, HPC, or LLMs
- Observability tooling like Prometheus, Grafana, DCGM Exporter
- Familiarity with multi-tenant security, RBAC, and policy enforcement
This is a full-time, direct-hire opportunity.
If intrested, please apply here!