What are the responsibilities and job description for the Site Reliability Engineer position at Insight Global?
We are seeking 2 Site Reliability Engineers who can deliver hypercare support to our customers for a few hours each day, including holidays and weekends. Candidates should have a minimum of four years of experience working with Azure.
Day-to-Day Responsibilities
Daily Hypercare:
- Provide hypercare support to customers, ensuring their operational needs are met.
- Review observability dashboards daily to identify and assess system performance.
Monitoring and Alerting Enhancements:
- Enhance monitoring and alerting systems to quickly detect and respond to issues.
- Continuously evaluate and improve existing monitoring tools to ensure effectiveness.
System Monitoring & Maintenance:
- Actively monitor ongoing system health to keep services up and running.
- Take proactive measures to mitigate potential disruptions.
Incident Response:
- Engage with teams during system-wide outages to troubleshoot and resolve issues promptly.
- Collaborate with cross-functional teams post-incident to analyze causes and implement preventative measures.
Support for Recurring Issues:
- Provide support for recurring technical issues and proactively seek to inform and improve system architecture.
- Analyze failure trends and recommend architectural improvements to maximize system reliability.
Documentation Development:
- Develop troubleshooting documentation for support teams to streamline issue resolution.
- Create comprehensive training materials for Level 1 and Level 2 support teams to enhance their understanding and capabilities.
Usage Pattern Analysis:
- Analyze system usage patterns to identify future infrastructure and capacity needs.
- Provide insights and recommendations to ensure the systems can effectively scale.
Scalability Assurance:
- Ensure that systems are designed for effective scaling to prevent outages during peak usage times.
- Implement strategies to maintain high availability and performance levels.
Required Skills and Experience
- 4 years of experience in Azure components and extensibility
- Strong programming knowledge using C#, SQL on Azure
- Urge to deliver quickly and effectively. Areas of expertise/contribution for leveling Technical
- Infrastructure as code: leverage cloud technologies to meet our goals
- Systems: manage, configure, and troubleshoot operating system issues, storage (block and object), and administer high-availability Azure resources.
- Monitoring and instrumentation: implementing metrics, log management.
- Engineering practices: availability, reliability and scalability, as well as disaster recovery.
- Identify features for the PRODUCT team (APIs failures, technical debt and prioritization)
Salary : $45 - $65
Principal Site Reliability Engineer
Candidate Experience site -
Santa Clara, CA
Senior Vice President, Site Reliability Engineer
BNY External Career Site -
Pittsburgh, PA
Managing Director, Site Reliability Engineer Manager
BNY External Career Site -
Lake, FL