What are the responsibilities and job description for the Site Reliability Engineer-W2 position at SR Talent Solution Inc.?
Title: Site Reliability Engineer-W2
Work Location: Chandler, AZ (Hybrid)
Need Local to Chandler AZ
Duration: 12 Months
Key responsibilities
● Collaborate with Development and Infrastructure teams to understand technical solutions and to implement the monitoring capabilities outlined in the application and system monitoring designs put forward by the SRE Lead.
● Mentor SRE resources on reliability practices and established tools/capabilities.
● Develop and maintain a catalog of extensible reliability scripts, tools and libraries that can be leveraged for common instrumentation, automation, and operational needs.
● Partner to implement code changes to make use of common reliability libraries and tools and help Application Production Services (APS) and Application Development teammates understand how to use them.
● Partner with infrastructure engineers and application teams to implement the necessary code changes to make use of common reliability libraries and tools and help the APS and Application Development teammates understand how to use them.
● Enubmittals per vendorgage as a subject matter expert (SME) in major incident triage efforts, failure scenario modelling and work with Problem Manager to diagnose root causes for major incident / problem management investigations.
● Identify vulnerabilities and opportunities for reliability improvement, such as investigating low level error rates and 'noise' in monitoring, and to help define solutions to reduce manual support effort and/or improve system reliability. Participate regularly in an on-call rotation with Production Support teammates to learn more about reliability issues affecting their portfolio
Primary Skill: Site Reliability Engineer
• Hands-on experience with Microsoft Azure infrastructure and platform services.
• Experience with Terraform, Terraform Enterprise, or comparable infrastructure-as-code tooling.
• Working knowledge of Azure networking, including VNets, subnets, route tables, firewalls, DNS, private endpoints, and load balancing.
• Experience with Azure Monitor, Azure Log Analytics, Dynatrace, Splunk, Prometheus, Grafana, or similar observability platforms.
• Experience with CI/CD tooling such as Jenkins, Azure DevOps, GitHub Actions, or comparable pipeline frameworks.
• Scripting or programming experience with Python, PowerShell, Bash, or similar languages.
• Understanding of cloud resiliency, high availability, logging, monitoring, and operational support practices.
• Strong analytical, troubleshooting, organizational, and communication skills.
• Ability to collaborate effectively with globally distributed engineering and operations teams.
Desired Qualifications
• Microsoft Azure certification.
• Experience in financial services, regulated technology, or other enterprise-scale environments.
• Experience with AKS, ACR, Kubernetes, container registries, or container-based deployment models.
• Experience with Azure AI Foundry, OpenAI/GenAI platform operations, model-serving observability, or AI platform readiness.
• Familiarity with SRE practices such as SLIs, SLOs, incident response, problem management, post-incident reviews, and toil reduction.
• Exposure to IAM, cloud security, policy-as-code, vulnerability remediation, and governance dashboards