What are the responsibilities and job description for the Site Reliability Engineer position at Matlen Silver?
Role Overview
We are seeking a Cloud Site Reliability Engineer (SRE) with a strong operations/production support background to support the Terraform platform within an internal cloud environment. This is not a development role—focus is on maintaining, monitoring, and supporting the platform in a live production setting.
Key Responsibilities
Provide production and operational support for the Terraform platform
Monitor system availability, performance, latency, and health
Handle incident management, troubleshooting, and root cause analysis
Support L2/L3 issues, manage tickets, and resolve user queries
Work closely with engineering teams to resolve production issues
Perform BAU activities and ensure system reliability
Create and maintain scripts for automation and operational efficiency
Required Skills
5 years of experience in SRE / Production Support / Operations
Strong experience with Unix/Linux environments
Proficiency in Python, Shell (Bash), and/or Ansible
Experience supporting Terraform (operational support, not development)
Hands-on experience with incident management and monitoring tools (e.g., Splunk, Dynatrace)
Experience with Jira (ticketing) and Bitbucket
Knowledge of internal/private cloud environments
Strong communication skills and experience working with end users
Preferred Skills
Exposure to PowerShell scripting
Basic knowledge of public cloud platforms (AWS, Azure, GCP)
Experience in enterprise or financial environments