What are the responsibilities and job description for the Senior Site Reliability Engineer position at Optomi?
SRE (Site Reliability Engineer)
On-site 4x a week | Plano, TX
Optomi, in partnership with a client in the financial services sector, is seeking a senior SRE engineer to ensure reliability, performance and availability of the applications within each domain. As a senior SRE engineer - applications, you will be working with development engineers, product owners, SRE Infrastructure, production engineers and Technology Operations Center personnel with a primary focus on improving observability, automation, overall system health, reliability and uptime.
Key Responsibilities:
- Design, code, and maintain automation to streamline operations, reduce manual tasks, and improve system efficiency to enable a robust application environment.
- Work with observability engineers to enable actionable insights into applications and infrastructure health and performance. Foster a collaborative team-culture and support professional development.
- Ensure scalable & repeatable code deployments with CI/CD pipelines using GitHub & Harness, repeatable deployments with infrastructure as code (IaC) using Terraform.
- Build automation and operational runbooks primarily using Python scripting.
- Manage container orchestration platforms and related cloud-native services.
- Drive reliability improvements through Service Level Objectives (SLOs), error budgets, and Service Level Agreements (SLAs) aligned with business goals.
- Design & implement observability improvements using Dynatrace & CloudWatch.
- Lead major incident responses and coordinate with stakeholders for resolution and drive problem management to prevent recurrence.
- Conduct blameless post-incident reviews and drive continuous improvement.
- Collaborate cross-functionally to embed SRE principles into application design and operation meeting reliability goals.
- Participate in architectural reviews, providing input on reliability and scalability.
Key Qualifications:
- Experience with DevOps tools like GitHub, Harness & Dynatrace.
- Experience building self-healing systems and automated remediation workflows.
- Demonstrated experience in problem-solving, key SRE/DevOps concepts & tools with a proven track record of achieving high system reliability and performance.
- Strong experience with Terraform for AWS IaC.
- Proficient in scripting and automation with Python and familiar with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).
- Deep knowledge of container orchestration (Kubernetes/EKS).
- Deep understanding of cloud platforms (e.g., AWS, GCP, Azure) and container orchestration technologies (e.g., Kubernetes).
- Effective communication skills, with the ability to convey complex technical concepts to diverse audiences.
Preferred Qualifications:
- AWS certifications (DevOps Engineer, Solutions Architect, etc.).
- Familiarity with GitOps, secrets management, and infrastructure monitoring best practices.
- Experience building self-healing systems and automated remediation workflows.
Salary : $70 - $75