What are the responsibilities and job description for the Senior Site Reliability Engineer position at Optomi?

SRE (Site Reliability Engineer)

On-site 4x a week | Plano, TX

Optomi, in partnership with a client in the financial services sector, is seeking a senior SRE engineer to ensure reliability, performance and availability of the applications within each domain. As a senior SRE engineer - applications, you will be working with development engineers, product owners, SRE Infrastructure, production engineers and Technology Operations Center personnel with a primary focus on improving observability, automation, overall system health, reliability and uptime.

Key Responsibilities:

Design, code, and maintain automation to streamline operations, reduce manual tasks, and improve system efficiency to enable a robust application environment.
Work with observability engineers to enable actionable insights into applications and infrastructure health and performance. Foster a collaborative team-culture and support professional development.
Ensure scalable & repeatable code deployments with CI/CD pipelines using GitHub & Harness, repeatable deployments with infrastructure as code (IaC) using Terraform.
Build automation and operational runbooks primarily using Python scripting.
Manage container orchestration platforms and related cloud-native services.
Drive reliability improvements through Service Level Objectives (SLOs), error budgets, and Service Level Agreements (SLAs) aligned with business goals.
Design & implement observability improvements using Dynatrace & CloudWatch.
Lead major incident responses and coordinate with stakeholders for resolution and drive problem management to prevent recurrence.
Conduct blameless post-incident reviews and drive continuous improvement.
Collaborate cross-functionally to embed SRE principles into application design and operation meeting reliability goals.
Participate in architectural reviews, providing input on reliability and scalability.

Key Qualifications:

Experience with DevOps tools like GitHub, Harness & Dynatrace.
Experience building self-healing systems and automated remediation workflows.
Demonstrated experience in problem-solving, key SRE/DevOps concepts & tools with a proven track record of achieving high system reliability and performance.
Strong experience with Terraform for AWS IaC.
Proficient in scripting and automation with Python and familiar with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).
Deep knowledge of container orchestration (Kubernetes/EKS).
Deep understanding of cloud platforms (e.g., AWS, GCP, Azure) and container orchestration technologies (e.g., Kubernetes).
Effective communication skills, with the ability to convey complex technical concepts to diverse audiences.

Preferred Qualifications:

AWS certifications (DevOps Engineer, Solutions Architect, etc.).
Familiarity with GitOps, secrets management, and infrastructure monitoring best practices.
Experience building self-healing systems and automated remediation workflows.

Salary : $70 - $75

Apply for this job

Receive alerts for other Senior Site Reliability Engineer job openings

Senior Site Reliability Engineer

What are the responsibilities and job description for the Senior Site Reliability Engineer position at Optomi?

What is the career path for a Senior Site Reliability Engineer?

Job openings at Optomi

Not the job you're looking for? Here are some other Senior Site Reliability Engineer jobs in the Plano, TX area that may be a better fit.

We don't have any other Senior Site Reliability Engineer jobs in the Plano, TX area right now.

AI Assistant is available now!