What are the responsibilities and job description for the Mid. level Site Reliability Engineer (SRE) position at Catapult Solutions Group?

12-month contract for a Mid. level Site Reliability Engineer (SRE) in Richardson, TX (onsite mandatory: 3-days per week, days vary, candidates should be flexible)

***We can ONLY consider candidates local to Richardson, TX for this role***

As a Site Reliability Engineer supporting backend services for a large scale SaaS collaboration platform, you will play a critical role in ensuring the reliability, scalability, and resilience of services used by millions of users globally. This role focuses on operational excellence, automation, and continuous improvement across cloud and hybrid environments.

Responsibilities:

Own deployment, operation, and reliability of critical collaboration services across cloud and hybrid environments
Participate in on call rotations for production systems, respond to alerts and incidents, and drive timely mitigation and resolution
Lead and contribute to production incident response, including triage, mitigation, root cause analysis, and post incident reviews
Design, enhance, and operate CI CD pipelines and automation frameworks to improve deployment safety, reliability, and recovery
Use standard alerting, monitoring, and observability tooling to detect issues, reduce mean time to recovery, and improve service health
Develop and maintain runbooks, escalation procedures, and operational documentation to support reliable production operations
Leverage observability and operational data to support capacity planning, scaling decisions, and resource optimization
Establish and promote operational best practices and a culture of reliability, accountability, and continuous improvement

Qualifications:

Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience
3-5 years of experience in Site Reliability Engineering, Cloud Operations, DevOps, or a related role
Hands-on experience operating production services in cloud or hybrid environments
Production experience with containerized workloads using Docker and Kubernetes, including deployments, troubleshooting, and scaling
Experience participating in on call rotations and responding to real world production incidents with a sense of urgency and proactive action
Proficiency in one or more scripting or programming languages such as Python, Go, or Bash for automation and operational tooling
Experience building or maintaining CI/CD pipelines and supporting production deployments and rollbacks
Experience using Infrastructure as Code tools such as Terraform or Ansible to manage production infrastructure
Solid understanding of Linux systems, networking, distributed systems, Git based development workflows, and infrastructure operations
Experience operating workloads in at least one major cloud platform such as AWS, Azure, or GCP, including core services like IAM, networking, and compute

Required Tools and Technologies:

Docker
Kubernetes
Linux
Python
Go
Bash
CI CD platforms
Git based version control
Cloud platforms including AWS, Azure, or GCP
Infrastructure as Code tooling
Monitoring and observability tools such as Prometheus, Grafana, Datadog, or CloudWatch
Incident management and alerting tools such as PagerDuty or equivalent

Salary : $40 - $42

Apply for this job

Receive alerts for other Mid. level Site Reliability Engineer (SRE) job openings

Mid. level Site Reliability Engineer (SRE)

What are the responsibilities and job description for the Mid. level Site Reliability Engineer (SRE) position at Catapult Solutions Group?

What is the career path for a Mid. level Site Reliability Engineer (SRE)?

Job openings at Catapult Solutions Group

Not the job you're looking for? Here are some other Mid. level Site Reliability Engineer (SRE) jobs in the Richardson, TX area that may be a better fit.

We don't have any other Mid. level Site Reliability Engineer (SRE) jobs in the Richardson, TX area right now.

AI Assistant is available now!