What are the responsibilities and job description for the Technology Services Analyst - Site Reliability position at Oliver James?
Position Overview
We're looking for an experienced and adaptable Site Reliability Analyst to join a growing Technology Services team. This individual will play a key role in ensuring the operational integrity and long-term scalability of our platforms. The position combines traditional IT support responsibilities with modern reliability engineering methods to create a stable and resilient technology environment that aligns with business priorities.
Key Responsibilities
We're looking for an experienced and adaptable Site Reliability Analyst to join a growing Technology Services team. This individual will play a key role in ensuring the operational integrity and long-term scalability of our platforms. The position combines traditional IT support responsibilities with modern reliability engineering methods to create a stable and resilient technology environment that aligns with business priorities.
Key Responsibilities
- Partner with engineering and infrastructure teams to assess the performance, resilience, and availability of systems. Advise on design decisions that impact operational reliability.
- Simulate potential failure scenarios when new features or architectural changes are deployed. Lead analysis sessions following service disruptions to drive improvements.
- Design and coordinate controlled failure testing (chaos engineering) to validate system robustness. Help execute performance assessments to support product readiness.
- Provide expert-level support during system outages or client-affecting incidents, leading troubleshooting efforts.
- Ensure system performance targets and reliability metrics are effectively defined and maintained.
- Create and update recovery documentation (runbooks) for critical systems, and guide SRE tool and process adoption across teams.
- Monitor usage patterns and plan for future capacity needs to maintain system responsiveness and growth.
- Keep infrastructure configurations consistent and up to date across various environments.
- Support ad hoc projects and contribute to broader technology initiatives as needed.
- A bachelor's degree in Computer Science, Information Systems, or a related field-or equivalent practical experience.
- 5 years of professional experience in technology operations, systems analysis, or site reliability engineering.
- Proven ability to diagnose complex technical issues and communicate solutions clearly.
- Familiarity with monitoring platforms, incident management practices, and vendor oversight.