What are the responsibilities and job description for the Senior Site Reliability Engineer position at Triunity Software, Inc.?
Note: We would prefer local professionals from the area, as there may be a need for an in-person meeting during the final round of discussions.
Role Summar
yWe are looking for a strong hands-on Lead AI-Assisted SRE / AIOps Engineer to help operationalize and scale an SRE agent-driven operations model. This role will lead the onboarding of existing scripts, SOPs, and operational workflows into the SRE agent while also supporting production releases, validation, incident response, and operational governance
.This is not a pure support role. The ideal candidate must be technically strong, practical, and capable of using independent judgment rather than relying blindly on AI outputs
.
Experien
- ceTotal 13 years of experience required and around 5 years of hands-on experience in IT operations, cloud operations, SRE, platform support, or production engineeri
- ngExperience working with monitoring, observability, scripting, and release validati
- onMust have experience in AIOps, AI-assisted operations, or automation-led support model
- s.Lead the adoption and operationalization of the SRE agent across support and reliability workflo
- wsTranslate existing scripts, runbooks, SOPs, and operational knowledge into agent-compatible workflo
- wsWork with teams to identify which use cases should be automated, semi-automated, or remain human-driv
- enValidate agent outputs, recommendations, and remediation steps before operational u
- seSupport production releases, release validation, smoke testing, and post-release health chec
- ksDrive troubleshooting during incidents and ensure proper root cause analysis and follow-throu
- ghImprove alert handling, event correlation, and operational response patter
- nsCoordinate with engineering, operations, and platform teams on onboarding and process chang
- esMentor junior engineers and guide them on workflow design, validation, and operational executi
- onMaintain high-quality documentation, runbooks, and operational standar
ds
Required Technical Ski
- llsStrong hands-on scripting experience in PowerShell/Python/Shell/B
- ashExperience with alert management/monitoring and automation, LLM/AI/MLOps/ ChatGPT, RAG/Langchain, AI chatbot development/configuration, or similar AI-driven operational solutio
- ns.Experience with monitoring, alerting, logs, dashboards, and incident workfl
- owsGood understanding of production support processes, release support, and validation practi
- cesExperience with cloud platforms i.e. Azure (Must ha
- ve)Familiarity with ITSM/ticketing tools such as ServiceNow, Jira, or simi
- larAbility to understand existing operational scripts and modernize them into scalable workfl
- owsExperience with APIs, integrations, or automation pipelines is prefer
- redExposure to Kubernetes / AKS/AI tools - ChatGPT, copilot is a p