What are the responsibilities and job description for the Artificial Intelligence Engineer position at HCLTech?
Senior AI Engineer – AI Center of Excellence (AI CoE)
Location: New Jersey, Dallas TX ,Santa Clara CA, Ashburn VA, North Virginia(Hybrid)
Job Type: Fulltime
Domain - AI & Data centers
Experience: 12 Years
Role Overview
This is a strategic, hands-on senior engineering role within the AI Center of Excellence (AI CoE), responsible for designing, building, and operating AI infrastructure and AI Factory platforms across hybrid environments (on‑prem, private cloud, and public cloud).
The role works closely with client and leading OEM partners as well as internal Sales, Pre‑Sales, and Delivery teams, to identify, shape, and execute AI‑driven business opportunities across the US and EU regions.
This is a quota‑driven, techno‑commercial role requiring deep technical execution along with stakeholder interaction and customer‑facing leadership.
Key Responsibilities
AI Infrastructure & Platform Engineering
- Design, deploy, and operate hybrid Kubernetes clusters across AWS, Azure, GCP, and on‑prem environments (bare metal, NVIDIA DGX, Grace Hopper).
- Own production-grade GPU infrastructure using NVIDIA GPU Operator, including:
- CUDA, drivers, MIG
- GPU‑aware scheduling and resource isolation policies
- Build and maintain high‑availability, scalable AI platforms supporting enterprise workloads.
MLOps & GenAI Platform Development
- Build production‑grade MLOps pipelines using:
- Kubeflow Pipelines
- GitOps (Argo CD / Flux)
- MLflow / DVC
- Deploy and operate Large Language Models (LLMs) using:
- NVIDIA Triton Inference Server
- TensorRT‑LLM
- vLLM
- Custom FastAPI / gRPC services
- Implement advanced inference techniques:
- Quantization, LoRA
- Dynamic batching
- Tenant‑level quota enforcement
- Safety & content filtering integrations
Data & Retrieval-Augmented Generation (RAG)
- Integrate and optimize vector databases for RAG and similarity search:
- Milvus, Pinecone, Qdrant, Weaviate, FAISS
- Enable scalable semantic search and GenAI-powered enterprise applications.
Observability, Security & Reliability
- Implement full‑stack observability using:
- Prometheus, Grafana
- Loki / ELK
- OpenTelemetry
- Define and monitor SLIs / SLOs for AI platforms.
- Enforce security and compliance standards:
- Kubernetes RBAC
- OPA / Gatekeeper
- Vault / KMS
- Image signing, policy enforcement
- GDPR / HIPAA compliance
Cost, Performance & Capacity Optimization
- Optimize GPU utilization through:
- Capacity planning
- Auto‑scaling & spot instances
- Cost transparency and chargeback models
- Improve platform efficiency while maintaining performance SLAs.
Enablement & Technical Leadership
- Convert experimentation into reproducible production pipelines.
- Enable engineering teams through:
- Technical documentation
- Tutorials and best practices
- Office hours and knowledge sessions
- Evaluate emerging technologies and lead PoCs across:
- NVIDIA innovations
- Open‑source ecosystems (Kubeflow, LangChain, vLLM, TGI, etc.)
- Drive the AI Infra & Platform technology roadmap.
Required Experience & Skills
Technical Expertise
- 8 years of hands‑on experience designing and operating production Kubernetes platforms (cloud on‑prem).
- Deep expertise in NVIDIA GPU stack (CUDA, MIG, GPU Operator).
- Strong hands‑on experience with:
- Kubeflow Pipelines or equivalent MLOps platforms
- Large‑scale LLM deployment and inference optimization
- Proficiency in Python and AI frameworks:
- PyTorch, TensorFlow
- Hugging Face, LangChain
- Infrastructure as Code (IaC):
- Helm, Kustomize, Terraform
- Experience with vector databases and RAG architectures.
- Strong SRE / observability background.
- Security‑first mindset with enterprise compliance exposure.
Nice to Have
- Experience with NVIDIA DGX and Grace Hopper platforms.
- Knowledge of OpenShift, k3s, or edge‑focused deployments.
- Experience with:
- KServe, LWS, serverless inference
- Contributions to open‑source projects (Kubernetes, Kubeflow, Triton, Milvus, vLLM).
- Certifications:
- CKA
- Cloud AI/ML certifications
- NVIDIA certifications
Qualifications
- B.E / B.Tech with a minimum 60% across academics.
- Proven experience delivering AI solutions across on‑prem, cloud, and hybrid environments.
- Strong analytical, strategic thinking, and stakeholder communication skills.
- Solid understanding of data centers, cloud platforms, AI & GenAI ecosystems.
Role Specifics
- Hands‑on senior engineering role
- Strong Techno‑Commercial orientation
- High ownership, visibility, and impact role
Disclaimer
HCL is an equal opportunity employer, committed to providing equal employment opportunities to all applicants and employees regardless of race, religion, sex, color, age, national origin, pregnancy, sexual orientation, physical disability or genetic information, military or veteran status, or any other protected classification, in accordance with federal, state, and/or local law. Should any applicant have concerns about discrimination in the hiring process, they should provide a detailed report of those concerns to secure@hcltech.com for investigation.
Compensation and Benefits
A candidate’s pay within the range will depend on their work location, skills, experience, education, and other factors permitted by law. This role may also be eligible for performance-based bonuses subject to company policies. In addition, this role is eligible for the following benefits subject to company policies: medical, dental, vision, pharmacy, life, accidental death & dismemberment, and disability insurance; employee assistance program; 401(k) retirement plan; 10 days of paid time off per year (some positions are eligible for need-based leave with no designated number of leave days per year); and 10 paid holidays per year.
Salary : $80,000 - $120,000