What are the responsibilities and job description for the Lead Site Reliability Engineer position at Liberty Personnel Services, Inc.?
Job Details:
Lead Site Reliability Engineer
The Lead Site Reliability Engineer is a senior technical leader responsible for the reliability, availability, and operational excellence of a cloud-based infrastructure and distributed platform. This role owns uptime, SLAs, and incident response while driving long-term improvements in resilience, observability, and automation. The Lead SRE is hands-on and partners closely with platform, QA, and development teams.
This role suits an engineer who thrives in high-ownership environments, balancing real-time operations with strategic reliability initiatives. You’ll define operational standards, disaster recovery practices, and automation frameworks, while leading incidents and postmortems with clarity and accountability.
Key Responsibilities
Josh Zeloyle
www.libertyjobs.com
610-684-8676
jz@libertyjobs.com
https://www.linkedin.com/in/joshuazeloyle/
#sre
#devops
Lead Site Reliability Engineer
The Lead Site Reliability Engineer is a senior technical leader responsible for the reliability, availability, and operational excellence of a cloud-based infrastructure and distributed platform. This role owns uptime, SLAs, and incident response while driving long-term improvements in resilience, observability, and automation. The Lead SRE is hands-on and partners closely with platform, QA, and development teams.
This role suits an engineer who thrives in high-ownership environments, balancing real-time operations with strategic reliability initiatives. You’ll define operational standards, disaster recovery practices, and automation frameworks, while leading incidents and postmortems with clarity and accountability.
Key Responsibilities
- Own uptime, SLAs, and overall platform reliability
- Lead incident response, root-cause analysis, and postmortems
- Automate infrastructure, deployments, and operational workflows
- Improve monitoring, alerting, and observability
- Execute and evolve disaster recovery and business continuity plans
- Optimize cloud and Kubernetes environments for scale and performance
- Establish runbooks, operational standards, and reliability best practices
- Provide technical leadership and mentorship
- 6 years in SRE, DevOps, or Platform Engineering; 2 years in a lead role
- Strong experience supporting production systems with strict SLAs
- Deep expertise in Kubernetes, containers, and cloud infrastructure
- Proficiency with Terraform and modern IaC practices
- Strong automation and scripting skills (Bash, Python, or Go)
- Experience with CI/CD, GitOps, and observability tooling
- Proven incident leadership and cross-functional communication skills
Josh Zeloyle
www.libertyjobs.com
610-684-8676
jz@libertyjobs.com
https://www.linkedin.com/in/joshuazeloyle/
#sre
#devops