What are the responsibilities and job description for the "Site Reliability Engineer" position at AJ Consulting Group, LLC?
Job Details
Title: Site Reliability Engineer(4 Openings)
Location: Raleigh, NC ( Onsite )
Duration: 12 Months
VISA: U.S. Citizens and s due to legal or government contract requirements
Tax Term: W2
JD:
Summary:
We are seeking highly skilled Site Reliability Engineers to support the reliability, scalability, and performance of critical enterprise platforms. This role requires seasoned professionals with deep technical expertise across cloud infrastructure, operating systems, automation, and modern observability practices. The ideal candidate brings a disciplined engineering mindset, excels under pressure, and consistently drives operational excellence through metrics, automation, and continuous improvement. This is a hands-on, engineering-focused position working closely with cross-functional teams to ensure the seamless operation of complex, high-availability systems.
Responsibilities:
- Design, implement, and maintain highly reliable, scalable, and secure systems across cloud and on-prem environments.
- Manage and optimize distributed systems running on platforms such as Azure, Linux (RHEL7 ), and Windows Server (2019 ).
- Build and improve automation workflows using scripting languages such as Python, Go, and Bash.
- Develop Infrastructure-as-Code solutions using tools like Terraform and Ansible.
- Define, monitor, and refine SLIs, SLOs, and SLAs to ensure consistent service quality.
- Reduce operational toil through automation, tooling enhancements, and process improvements.
- Integrate systems with observability platforms to ensure full operational visibility and proactive issue identification.
- Troubleshoot complex incidents, lead structured incident response efforts, and conduct detailed post-mortem analyses.
- Collaborate closely with software engineering, infrastructure, and business teams to deliver resilient and performant services.
- Identify opportunities to optimize system reliability, performance, and maintainability, taking full ownership of problem space
Requirements:
- Proven experience as a Site Reliability Engineer, with a background in software engineering, infrastructure, or operations.
- Hands-on experience with cloud platforms (e.g., Azure) and enterprise operating systems (Linux RHEL7 , Windows Server 2019 ).
- Strong understanding of networking and storage technologies including NFS, SAN, and NAS.
- Working knowledge of authentication and naming services such as DNS, LDAP, Kerberos, and Centrify.
- Proficiency in scripting and automation (Python, Go, Bash).
- Practical experience with Terraform, Ansible, or similar IaC tools.
- Demonstrated ability to design and monitor SLIs/SLOs/SLAs and drive reliability improvements through metrics and automation.
- Experience integrating with modern observability platforms (logs, metrics, tracing).
- Ability to remain calm and structured during high-pressure incidents and operational events.
- Strong communication and collaboration skills, with the ability to support and influence cross-functional stakeholders.
- A proactive, ownership-oriented mindset with a commitment to continuous operational improvement.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Salary : $50 - $55