What are the responsibilities and job description for the Site Reliability Engineer position at Talener?

Job Title: Site Reliability Engineer (SRE)

Location: Remote (U.S.)

Overview

A fast-growing healthcare technology organization is seeking a Site Reliability Engineer (SRE) to help scale and support a high-impact cloud platform focused on improving healthcare delivery nationwide. This role will play a critical part in strengthening platform reliability, operational efficiency, observability, and automation across production environments.

The ideal candidate is passionate about infrastructure stability, incident response, automation, and continuous improvement within modern cloud-native environments.

Key Responsibilities

Ensure the reliability, scalability, performance, and security of cloud-based infrastructure and applications
Monitor, troubleshoot, and resolve production platform and application issues across distributed systems
Lead incident response efforts, root cause analysis, and blameless post-mortems
Build and maintain operational runbooks and automated remediation workflows
Develop and enhance observability and telemetry solutions for proactive monitoring and alerting
Collaborate closely with engineering, DevOps, QA, security, and operations teams to improve platform health and deployment processes
Support infrastructure automation and configuration management initiatives
Contribute to infrastructure-as-code (IaC) practices and CI/CD operational improvements
Promote best practices around reliability engineering, incident management, and operational excellence
Participate in an on-call rotation supporting production systems, including occasional off-hours support for West Coast operations

Required Qualifications

5 years of experience in Site Reliability Engineering, DevOps, Cloud Infrastructure, or related disciplines
Strong experience troubleshooting and supporting production environments
Hands-on experience with observability and monitoring platforms such as Datadog, New Relic, or similar tools
Experience working within Azure-based cloud environments and modern containerized infrastructure
Knowledge of Docker, Kubernetes, and cloud-native application hosting environments
Experience with infrastructure-as-code tools such as Terraform, Terragrunt, or OpenTofu
Strong scripting and automation experience using PowerShell, Python, JavaScript, or similar languages
Experience with source control and CI/CD tooling (Git, Azure DevOps, etc.)
Understanding of cloud security principles, compliance frameworks, and operational best practices
Strong collaboration and communication skills within Agile engineering environments

Preferred Qualifications

Experience improving operational visibility through telemetry, dashboards, reports, and alerting systems
Experience evolving incident response processes and operational tooling
Passion for mentoring others and promoting operational excellence across teams
Strong problem-solving mindset with a focus on continuous improvement and automation

Additional Details

Opportunity to work on mission-driven technology with meaningful real-world impact
Collaborative engineering culture focused on innovation, reliability, and continuous learning
Flexible environment that supports work-life balance while maintaining operational excellence

If interested/qualified, please email mmclaughlin@talener.com

Apply for this job

Receive alerts for other Site Reliability Engineer job openings

Site Reliability Engineer

What are the responsibilities and job description for the Site Reliability Engineer position at Talener?

Job openings at Talener

Not the job you're looking for? Here are some other Site Reliability Engineer jobs in the Orlando, FL area that may be a better fit.

We don't have any other Site Reliability Engineer jobs in the Orlando, FL area right now.

AI Assistant is available now!