What are the responsibilities and job description for the AI-Ops Engineer position at SOAL Technologies, LLC.?
Job Details
Responsibilities:
AI-Driven Operations & Automation
• Implement AIOps solutions using ML to automate performance monitoring, workload scheduling, and infrastructure operations.
• Build anomaly detection systems to identify system issues before they impact users.
• Develop automated root cause analysis using ML-driven event correlation.
• Create predictive maintenance workflows based on historic patterns and telemetry data.
• Design and execute automated remediation scripts for incident response.
Observability & Intelligent Monitoring
• Build observability platforms that aggregate logs, metrics, and events into unified dashboards.
• Implement intelligent alerting using NLP/ML to reduce noise and prioritize actionable insights.
• Deploy APM tools integrated with AI-powered analytics.
• Ensure full visibility across cloud infrastructure, applications, and ML workloads.
Cloud Infrastructure & DevOps
• Design and maintain scalable AWS infrastructure using CloudFormation, Terraform, or CDK.
• Build and manage containerized workloads (Docker, ECS, Fargate, EKS).
• Create CI/CD pipelines incorporating AI-driven deployment and quality checks.
• Automate cloud operations to optimize cost, scalability, and reliability.
• Ensure all cloud architecture meets Stanford’s compliance requirements (FERPA, GDPR).
Collaboration & Continuous Improvement
• Partner with engineers and cross-functional teams to deliver AIOps capabilities.
• Use Git-based workflows and participate in code reviews.
• Document runbooks, automation workflows, and operational procedures.
• Continuously evaluate emerging AIOps tools and methodologies.
• Contribute to building a culture focused on predictive and automated operations.
Qualifications:
Required
• Bachelor’s degree in Computer Science, DevOps, Cloud Engineering, or related field (Master’s preferred).
• 3 years in DevOps, SRE, or Cloud Engineering roles.
• 2 years hands-on experience with AWS (EC2, Lambda, ECS/Fargate, S3, IAM, VPC).
• Strong Python programming skills.
• Experience implementing monitoring and observability solutions at scale.
• Familiarity with ML/AI concepts applied to automation.
Technical Skills
• Languages: Python required; Bash, Go, or TypeScript preferred.
• Monitoring Tools: CloudWatch, X-Ray, Prometheus, Grafana, Datadog, Splunk.
• Infrastructure as Code: CloudFormation, Terraform, CDK.
• Containers & Orchestration: Docker, ECS/Fargate, Kubernetes (EKS).
• AWS Services: Lambda, EC2, S3, API Gateway, EventBridge, CloudWatch, IAM, CodePipeline, SageMaker.
• CI/CD: GitHub Actions, CodePipeline, Jenkins, GitLab CI.
• Data & Analytics: Log aggregation, metrics analysis, event correlation.
Desired Attributes
• Strong understanding of AIOps principles and automation-first operations.
• Passion for eliminating manual work through AI-driven solutions.
• Excellent debugging and root cause analysis skills.
• Adaptable, collaborative, and eager to learn with strong communication skills.
• Thrives in fast-paced environments with evolving technology stacks.
Salary : $50 - $60