What are the responsibilities and job description for the Systems Reliability Engineer || Contract role || Remote position at Russell, Tobin & Associates?
Job Details
Hello,
My name is Mohammed Tousif, and I am a Recruiter with Russell Tobin. I came across your resume and was hoping to discuss your current employment situation in more detail. I have included a position below that you may be a great fit for.
If you are not looking for an immediate opportunity, I would still love to connect. Also, if you are not interested in the position, please feel free to pass this opportunity along to your friends and colleagues that may be interested.
Title: Systems Reliability Engineer
Location: Remote
Contract role
Pay Range: $55-$69 per hour
Key Responsibilities:
Contribute to the SRE strategy and establish best practices for release management, automation, and system reliability.
Mentor and guide SRE, Engineering, and Product teams in adopting core SRE principles such as service ownership, reducing toil, and continuous improvement.
Lead initiatives across SLIs/SLOs, observability, incident management, and postmortem practices, ensuring insights and learnings are captured and acted upon.
Champion SRE practices by implementing repeatable templates for logging, monitoring, and alerting frameworks.
Drive observability and monitoring excellence using tools such as Grafana, AppDynamics (AppD), and Sumo Logic, ensuring proactive detection and resolution of issues.
Partner with engineering to design reliable, fault-tolerant systems and reduce operational toil through automation.
Implement and leverage the Ansible Automation Platform to help teams automate infrastructure provisioning, configuration management, and event-driven workflows.
Enable teams to automate operational events and infrastructure changes, reducing manual intervention and improving system resilience.
Exercise sound judgment to ensure operational compliance with security, privacy, audit, disaster recovery, and other company requirements.
Job-Specific Skills, Experience & Education
Required
Minimum of 5 years of experience in Site Reliability Engineering, IT operations, or related fields.
Bachelor's degree in computer science, engineering, or equivalent experience (2 additional years in lieu of degree).
Technical expertise in system reliability, scalability, application design, and performance.
Hands-on experience with observability and monitoring tools such as Grafana, AppDynamics, and Sumo Logic.
Experience with automation platforms, particularly Ansible, for infrastructure and event-driven automation.
Proven ability to mentor and guide engineers in adopting SRE practices and principles.
Excellent communication and collaboration skills across diverse teams and vendors.
Strong judgment and problem-solving capabilities.
Experience working in multi-cloud environments.
Strong interpersonal, organizational, communication, and customer service skills.
Must be authorized to work in the U.S.
Preferred
Experience applying ITIL, SRE and IT process best practices.
Experience in tracking major incidents, rollbacks, and hotfixes; leading root cause analysis (RCA) processes; and ensuring resolution and completion of action items.
Experience with technical engineering in IT operations.
Mohd Tousif
Senior Associate - COE
- Ext. 0268
420 Lexington Ave, 30th Fl.
New York, NY 10170
Salary : $55 - $69