What are the responsibilities and job description for the Site Reliability Engineer position at Optomi?
Optomi, in partnership with a leading organization, is looking for a Site Reliability Engineer (SRE) to join their team.
Position Summary: This role focuses on maintaining and optimizing cloud environments, primarily in AWS (80%), with some exposure to GCP (15%) and Azure (5%). The Site Reliability Engineer will work on existing Terraform infrastructure, ensuring systems are efficient and functional, without the need to write Terraform from scratch. Candidates should be comfortable scripting, using tools like Splunk, and speaking in front of stakeholders and leadership, including calls with over 100 attendees. The position is onsite two days a week at the Kirkman Point office, with flexibility for Tuesday, Thursday, or Friday. The contract is initially for four months, with the possibility of extension.
What the right candidate will enjoy:
- Working in a dynamic cloud environment with exposure to multiple platforms (AWS, GCP).
- Opportunities to collaborate with leadership and stakeholders.
- A supportive team environment with potential for contract extension.
What type of experience does the right candidate have:
- Strong expertise in AWS and GCP cloud environments.
- Proficiency in scripting.
- Experience maintaining and modifying existing Terraform infrastructure.
- Familiarity with monitoring tools like Splunk.
- Confidence speaking in front of stakeholders and large groups.
What the responsibilities are of the right candidate:
- Maintain and optimize cloud infrastructure, primarily in AWS and GCP.
- Work on existing Terraform setups to ensure system functionality.
- Collaborate with stakeholders and leadership, including participation in large-scale calls.
- Utilize monitoring tools like Splunk to ensure system performance.
- Support the team with scripting and automation as needed.
- Position requiring more and more understanding of coding, but doesn't have to be a coder - just need to be able to recognize when there is a code problem and go back and say no this is a code issue not an infrastructure issue
- Support AWS, Azure, GCP, Akamai, onprem and F5 and prob some others
- AWS is largest cloud provider, AWS and something else usually
- Terraform for IaC, have some stuff in Chef, be able to troubleshoot and communicate with executives and customers
- Bridge calls with incidents going on, will be on calls with multiple executives pushing for answer
- Azure and GCP but both strength in one of the 2- coupled with the AWS
- Their on prem infrastructure is mostly middle ware and knowing Linux or Windows
- Lost one person with GCP and has more work coming in that he can cover
- Need new Azure servers built to their standards and a lot of Windows work in Azure and need the ability to troubleshoot CI/CD pipelines
- Tend to ask that someone some kind of scripting or coding background, not always from scratch but includes troubleshooting- when an application team cant find their own problem
- Tend to go with Python or Base or Node- if can read and understand Node that will be help
- As long as they can think and learn quickly
- Have had 2 folks with more coding experience but she is not picky
- Enterprise experience- Splunk, AppDynamics, multiple tools