What are the responsibilities and job description for the Site Reliability Engineer (SRE) - Azure position at Jobs via Dice?
Dice is the leading career destination for tech experts at every stage of their careers. Our client, Matlen Silver, is seeking the following. Apply via Dice today!
SO5 - Public Cloud - Azure Sub-effort 68: The requested Azure SRE role will ensure reliability and operational excellence for Azure platform services delivered through the product operating model.
The role partners closely with horizontal teams across platform engineering, networking, security, and AI infrastructure to embed reliability into Azure-native solutions.
Responsibilities include supporting observability, incident response, automation, resilience testing, and operational runbook development for Azure landing zones, ingress and egress DMZs, service integration layers, and AI infrastructure supporting Microsoft Foundry and private OpenAI access.
The role ensures platform components meet defined reliability, availability, and scalability objectives while enabling consistent onboarding and sustained operation of enterprise workloads in Azure.
Primary SkillMicrosoft Azure
Desired Skills
Tertiary SkillCloud Architect
SO5 - Public Cloud - Azure Sub-effort 68: The requested Azure SRE role will ensure reliability and operational excellence for Azure platform services delivered through the product operating model.
The role partners closely with horizontal teams across platform engineering, networking, security, and AI infrastructure to embed reliability into Azure-native solutions.
Responsibilities include supporting observability, incident response, automation, resilience testing, and operational runbook development for Azure landing zones, ingress and egress DMZs, service integration layers, and AI infrastructure supporting Microsoft Foundry and private OpenAI access.
The role ensures platform components meet defined reliability, availability, and scalability objectives while enabling consistent onboarding and sustained operation of enterprise workloads in Azure.
Primary SkillMicrosoft Azure
Desired Skills
- Experience:
- Direct experience supporting AI infrastructure in Azure, including platforms enabling Microsoft Foundry and private OpenAI access.
- Bring strong familiarity with designing and operating highly available service integration layers and network perimeters in complex enterprise environments.
- Have advanced experience integrating reliability objectives into platform product operating models and influencing reliability standards across horizontal teams.
- Demonstrated strong skills in automation, platform observability design, and driving operational excellence across shared Azure services.
- Experience:
- Strong experience operating and supporting Azure platform services with a clear focus on reliability, availability, and scalability.
- Have hands-on experience with Site Reliability Engineering practices, including observability, incident response, automation, resilience testing, and operational readiness.
- Collaborated effectively with platform engineering, networking, security, and infrastructure teams to embed reliability into Azure-native solutions.
- Supported enterprise Azure landing zones, ingress and egress DMZs, service integration layers, and shared platform components.
- Developed and maintained operational runbooks and ensure platforms support consistent onboarding and sustained operation of enterprise workloads.
- Solid experience operating secure Azure environments and supporting AI infrastructure workloads at scale.
Tertiary SkillCloud Architect