What are the responsibilities and job description for the Lead MLOps / AI Platform Engineer position at SATCON Inc?

Job Description: Lead MLOps / AI Platform Engineer

Location: Charlotte, NC

Duration: Long Term

Visa Type: & Candidates

Role Overview

We are seeking a highly skilled Lead MLOps / AI Platform Engineer to design, build, and optimize our next-generation Generative AI and Large Language Model (LLM) infrastructure. This role is pivotal in bridging the gap between cutting-edge AI research and robust production deployment. You will be responsible for orchestrating high-performance GPU environments (specifically leveraging Nvidia H200s), optimizing LLM inference, and maintaining enterprise-grade infrastructure across both Cloud (Google Cloud Platform/Azure) and On-Premise environments.

Key Responsibilities

AI Inference Optimization & Serving

Deploy, scale, and manage large-scale language models using advanced inference frameworks such as vLLM, TensorRT-LLM, SGLang, and Triton Inference Server.
Implement and fine-tune performance optimization strategies, including Continuous Batching and advanced Parallelism techniques.
Conduct load testing, benchmarking, and profiling of LLM deployments using GuideLLM and Locust to ensure optimal latency and throughput.

Cloud & Infrastructure Orchestration

Architect and maintain scalable, secure infrastructure on Google Cloud Platform and Azure using Infrastructure as Code (Terraform).
Design and execute Cloud Networking, Landing Zones, and Organization Policies/Governance.
Manage secrets and secure workloads utilizing HashiCorp Vault.
Develop and champion Internal Developer Portals to streamline workflows for data science and product teams.

On-Premise & Kubernetes Engineering

Orchestrate ML workloads on Kubernetes, utilizing KServe, OpenShift AI / OpenShift Functions, and Helm charts/Operators.
Manage compute clusters with a heavy focus on advanced GPU Orchestration (Nvidia H200 ecosystems).
Demonstrate deep hands-on expertise in architecture and "know-how to unfold an LLM" into highly constrained or custom on-premise hardware setups.

Observability & SRE

Implement end-to-end ML Observability and monitoring frameworks using Arize AI.
Establish Site Reliability Engineering (SRE) best practices, maintaining strict SLOs/SLIs for model deployment pipelines and production APIs.

Required Skills & Qualifications

Core AI / MLOps Stack:

Inference Engines: vLLM, TensorRT-LLM, Triton Inference Server, SGLang
ML Frameworks/Ops: KServe, OpenShift AI, Arize AI, GenAI Platforms, RAG architecture
Performance & Testing: GuideLLM, Locust, Continuous Batching, Parallelism optimization
Infrastructure & Cloud Stack:
Cloud Providers: Google Cloud Platform (Google Cloud Platform), Microsoft Azure
Containerization & Orchestration: Kubernetes, OpenShift, Helm/Operators, GPU Orchestration
IaC & Automation: Terraform, Python
Security & Networking: HashiCorp Vault, Landing Zones, Org Policy & Governance
Hardware Sanity Check:
Mandatory Experience: Direct, hands-on experience working with Nvidia H200 GPUs and optimizing workloads specifically for this architecture.

Salary : $60 - $70

Apply for this job

Receive alerts for other Lead MLOps / AI Platform Engineer job openings

Lead MLOps / AI Platform Engineer

What are the responsibilities and job description for the Lead MLOps / AI Platform Engineer position at SATCON Inc?

What is the career path for a Lead MLOps / AI Platform Engineer?

Job openings at SATCON Inc

Not the job you're looking for? Here are some other Lead MLOps / AI Platform Engineer jobs in the Charlotte, NC area that may be a better fit.

We don't have any other Lead MLOps / AI Platform Engineer jobs in the Charlotte, NC area right now.

AI Assistant is available now!