What are the responsibilities and job description for the AI DevOps Engineer position at Alignity Solutions?
- Jobseeker Video Testimonials
- Employee Glassdoor Reviews
We are an IT Solutions Integrator/Consulting Firm helping our clients hire the right professional for an exciting long-term project. Here are a few details.
Requirements
About the
Role
Our client is seeking
a highly skilled AI DevOps Engineer to design, build, and operate
scalable, secure, and production-grade infrastructure supporting modern AI
platforms and LLM-powered applications.
This role sits
at the intersection of DevOps, Platform Engineering, Site Reliability
Engineering (SRE), and AI Infrastructure, enabling high-performance AI
systems, agent-based workflows, and enterprise AI platforms within a regulated
financial services environment.
The ideal
candidate will have strong expertise in Kubernetes, Terraform, cloud
infrastructure, automation, and AI platform operations, along with
experience supporting modern AI/LLM workloads in production environments.
Key
Responsibilities
- Design, deploy, and manage scalable infrastructure
for AI and LLM-based applications in production environments.
- Build and maintain Infrastructure-as-Code (IaC)
using tools such as Terraform for secure, repeatable, and auditable
deployments.
- Deploy, manage, and scale containerized
environments using Kubernetes with a focus on high availability and
reliability.
- Implement DevOps, Platform Engineering, and SRE
best practices to improve system reliability, scalability, and operational
efficiency.
- Support AI platform services for model serving,
inference, experimentation, and evaluation workflows.
- Deploy and maintain infrastructure supporting AI
agents, orchestration frameworks, and LLM runtime dependencies.
- Design and manage vector database infrastructure
including Pinecone, Weaviate, or PostgreSQL with pgvector for RAG and
semantic search use cases.
- Enable AI developer platforms and tooling for
engineering teams building AI-powered applications.
- Implement monitoring, alerting, logging, and
incident response processes for mission-critical AI systems.
- Collaborate with security, compliance, and
governance teams to ensure adherence to regulatory and enterprise security
standards.
- Continuously improve automation, developer
experience, and operational processes for AI infrastructure environments.
Required
Qualifications
- Bachelor’s degree in Computer Science, Engineering,
or equivalent practical experience.
- Proven experience as a DevOps Engineer, Platform
Engineer, or Site Reliability Engineer (SRE).
- Strong hands-on experience managing large-scale
production infrastructure.
- Expertise with Terraform and Infrastructure-as-Code
(IaC) methodologies.
- Strong experience deploying and operating
Kubernetes-based environments.
- Experience supporting infrastructure for AI
platforms or LLM-based applications.
- Strong understanding of automation, scalability,
reliability, and cloud-native architectures.
Preferred
Qualifications
- Experience supporting production-grade LLM
applications and AI agent workloads.
- Hands-on experience with vector databases such as
Pinecone, Weaviate, or pgvector.
- Experience building or supporting AI tooling and
internal AI developer platforms.
- Knowledge of observability, monitoring, capacity
planning, and reliability engineering for AI/ML systems.
- Experience working within financial services or
other highly regulated industries.
- Strong communication and cross-functional
collaboration skills.
Benefits