What are the responsibilities and job description for the Site Reliability Engineer position at Insight Global?

We are seeking 2 Site Reliability Engineers who can deliver hypercare support to our customers for a few hours each day, including holidays and weekends. Candidates should have a minimum of four years of experience working with Azure.

Day-to-Day Responsibilities

Daily Hypercare:

Provide hypercare support to customers, ensuring their operational needs are met.
Review observability dashboards daily to identify and assess system performance.

Monitoring and Alerting Enhancements:

Enhance monitoring and alerting systems to quickly detect and respond to issues.
Continuously evaluate and improve existing monitoring tools to ensure effectiveness.

System Monitoring & Maintenance:

Actively monitor ongoing system health to keep services up and running.
Take proactive measures to mitigate potential disruptions.

Incident Response:

Engage with teams during system-wide outages to troubleshoot and resolve issues promptly.
Collaborate with cross-functional teams post-incident to analyze causes and implement preventative measures.

Support for Recurring Issues:

Provide support for recurring technical issues and proactively seek to inform and improve system architecture.
Analyze failure trends and recommend architectural improvements to maximize system reliability.

Documentation Development:

Develop troubleshooting documentation for support teams to streamline issue resolution.
Create comprehensive training materials for Level 1 and Level 2 support teams to enhance their understanding and capabilities.

Usage Pattern Analysis:

Analyze system usage patterns to identify future infrastructure and capacity needs.
Provide insights and recommendations to ensure the systems can effectively scale.

Scalability Assurance:

Ensure that systems are designed for effective scaling to prevent outages during peak usage times.
Implement strategies to maintain high availability and performance levels.

Required Skills and Experience

4 years of experience in Azure components and extensibility
Strong programming knowledge using C#, SQL on Azure
Urge to deliver quickly and effectively. Areas of expertise/contribution for leveling Technical
Infrastructure as code: leverage cloud technologies to meet our goals
Systems: manage, configure, and troubleshoot operating system issues, storage (block and object), and administer high-availability Azure resources.
Monitoring and instrumentation: implementing metrics, log management.
Engineering practices: availability, reliability and scalability, as well as disaster recovery.
Identify features for the PRODUCT team (APIs failures, technical debt and prioritization)

Salary : $45 - $65

Apply for this job

Receive alerts for other Site Reliability Engineer job openings

Site Reliability Engineer

What are the responsibilities and job description for the Site Reliability Engineer position at Insight Global?

What is the career path for a Site Reliability Engineer?

Job openings at Insight Global

Not the job you're looking for? Here are some other Site Reliability Engineer jobs in the Paul, MN area that may be a better fit.

We don't have any other Site Reliability Engineer jobs in the Paul, MN area right now.

AI Assistant is available now!