What are the responsibilities and job description for the System Engineer position at Swoon?
Pay: $59 - $69/hr (Depending on experience)
Location: Seattle, WA (Hybrid)
Duration: 12 months (strong potential for FTE conversion)
About the Role
We are supporting one of our clients in hiring a Systems Reliability Engineer to help drive reliability, automation, and operational excellence across critical engineering platforms. In this role, you will champion core SRE principles, improve observability, implement automation using Ansible, and work closely with engineering and product teams to build scalable, fault-tolerant systems. This is a hands-on role focused on reducing toil, strengthening monitoring and alerting practices, and ensuring systems meet security, audit, and compliance standards.
What You’ll Do
- Contribute to SRE strategy and help define best practices around release management, automation, and system reliability.
- Mentor and guide engineering and product teams in adopting SRE principles such as service ownership, SLIs/SLOs, and continuous improvement.
- Lead efforts in observability by creating repeatable standards for logging, monitoring, dashboards, and alerting frameworks.
- Improve monitoring maturity using tools like Grafana, AppDynamics (AppD), and Sumo Logic, ensuring proactive issue detection and quick resolution.
- Partner with engineering teams to design reliable, scalable, and fault-tolerant systems across multi-cloud environments.
- Reduce operational toil by implementing automation through the Ansible Automation Platform, enabling event-driven workflows and infrastructure-as-code.
- Support incident management by participating in major incident response, postmortems, and follow-up action tracking.
- Ensure operational compliance across security, privacy, audit, and disaster recovery requirements.
- Contribute to system documentation, operational runbooks, and knowledge sharing across teams.
- Collaborate cross-functionally with engineering, DevOps, product, and vendor partners to improve stability and performance across services.
What We’re Looking For
- 5 years of experience in Site Reliability Engineering, IT operations, DevOps, or similar roles.
- Strong technical background in reliability engineering, system performance, and scalable architecture.
- Hands-on experience with observability and monitoring tools such as Grafana, AppDynamics, and Sumo Logic.
- Experience with Ansible for infrastructure provisioning, configuration management, and automation workflows.
- Ability to mentor engineers and promote SRE principles across diverse teams.
- Strong judgment, troubleshooting, and problem-solving skills.
- Experience working in multi-cloud environments (AWS, Azure, GCP).
- Excellent communication, collaboration, and customer service skills.
- Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
- Must be legally authorized to work in the U.S.
- Preferred: Experience applying ITIL/SRE best practices, managing major incidents, driving RCAs, and supporting hotfix or rollback processes.
Salary : $59 - $69