What are the responsibilities and job description for the Technical Engineer - Site Reliability Engineer position at take2it?
Overview
This role involves providing technical expertise and support in the development, implementation, and maintenance of engineering solutions. The position focuses on designing and developing complex technical solutions, conducting system integration, testing, and validation to ensure performance and reliability. Collaboration with cross-functional teams is essential to gather requirements and offer technical guidance, along with troubleshooting and resolving issues to maintain system functionality.
Education & Certification Requirements
Not specified.
Onsite Requirements
This role is onsite in Raleigh, NC, and requires working five days a week onsite.
Responsibilities
- Develop, implement, and maintain engineering solutions to address complex technical challenges.
- Conduct system integration, testing, and validation to ensure optimal system performance and reliability.
- Collaborate with cross-functional teams to gather requirements and provide technical guidance.
- Troubleshoot and resolve technical issues to sustain system functionality.
- Document technical specifications, processes, and procedures for future reference and compliance.
- Manage cloud infrastructure, primarily in Azure, with proficiency in automation and infrastructure as code tools such as Terraform and Ansible.
- Implement and monitor SLIs, SLOs, and SLAs to systematically improve system reliability.
- Integrate systems with observability platforms to ensure comprehensive system visibility.
- Apply scripting and automation skills in Python, Bash, or Go to enhance system efficiency and reliability.
- Maintain a proactive, ownership-driven approach in continuous system improvement.
Qualifications
- Minimum of 3 years of experience in site reliability engineering or related roles.
- Proven expertise in Azure infrastructure with intermediate knowledge in Ansible, Terraform, and Linux (Red Hat).
- Experience working with cloud platforms, Linux RHEL7 , Windows Server 2019 , and networking fundamentals.
- Strong understanding of networking and storage technologies such as NFS, SAN, and NAS.
- Knowledge of authentication and naming services including DNS, LDAP, Kerberos, and Centrify.
- Proficiency in scripting and automation, including Python, Bash, or Go.
- Practical experience with infrastructure as code tools like Terraform and Ansible.
- Ability to define and manage SLIs, SLOs, and SLAs, with a focus on reducing operational toil.
- Capable of integrating systems with observability platforms and applying metrics-driven approaches.
- Calmness under pressure, especially during incidents and outages, with a structured incident response process.
- Strong communication and collaboration skills across engineering and business teams.
Desired Skills
- Experience with Windows server environments and networking fundamentals beyond basic knowledge.
- Familiarity with storage area networks (SAN) and network-attached storage (NAS).
- Knowledge of advanced monitoring and observability tools.
- Exposure to scripting in languages beyond Python and Bash.
- Previous experience in financial or highly regulated industries.