What are the responsibilities and job description for the Site Reliability Engineer position at Cyberobotix?

Job Title: Senior Site Reliability Engineer (SRE)

Location: Atlanta, GA(Hybrid)

Duration: Long Term

Work Mode: A hybrid work schedule will be followed where Team A will work from the office on Thursday, Friday [Sat and Sun off-days] Monday, Tuesday, and Wednesday while Team B works remotely during the same period. In the following week, the schedule will rotate, with Team B working from the office and Team A working remotely

Job Description:

Required Skillset

Manage and optimize data streaming and API components in OpenShift Onpremise and AWS.
Proactively review the application’s APIs and processes to identify opportunities to optimize the response times for various application components.
Automate various types of testing including data quality checks, automate delivery to production, and automate deployment for production.
Develop integrations between the application in Onpremise and AWS and our third-party tools (ServiceNow, VersionOne, Sumo).
Work with teams to create SLI/SLO’s.
Actively monitor and lead troubleshooting of degraded performance and hard-to-define issues for the platform applications, develop the solution, and document artifacts in the backlog from root cause analysis.
Evolve the cloud infrastructure ecosystem for our application suite by experimenting with emerging technologies and completing prototypes to understand benefit.
Design and develop CI/CD pipelines to deploy various application artifacts, including APIs and Data Process Jobs.
Analyze, design, and develop the artifacts to configure monitoring and alerting metrics so the support engineers can proactively and timely validate, troubleshoot, and resolve issues.
Maintain data integrity and access control by using AWS security tools and services such as HSM, IAM, etc.
Understand and develop tools to monitor AWS billing for services, generate cost-related reports, and help develop and implement cost optimization strategies.
Work with enterprise security architects to design and implement data security tools, measures, data encryption, and key management.
Design and develop solutions to address security vulnerabilities discovered by internal security audit teams, vendors, and the security community.
Design and develop solutions for support teams to regularly scan and review to fix security issues.
Regularly and proactively monitor and analyze the capacity and performance of the platform.
Work with the architecture team to design and implement elastic infrastructure to accommodate irregular bursts of user traffic/requests.
Work with the architecture team to develop backup strategies and implement backup solutions for critical data and application components for service restoration and disaster recovery purposes.
Work with architecture, infrastructure, and application teams to provide input on continuous improvement in design, performance, and security enhancements.

Desired Skillset

Deep understanding of the operations of AWS cloud platforms.
Must be well versed in automation, scripting, and monitoring, including use of tools from major cloud platforms such as OpenShift, CloudFormation, Terraform, Ansible, Shell, and Python.
Preferable candidates with significant technical knowledge across infrastructure layers, including:
Linux OS
Major virtualization platforms
Traditional and software-defined networking
Load balancers
Firewalls
API tools
Performance and intelligent monitoring tools
Storage
Backup strategy
Significant knowledge and experience in end-to-end operations for enterprise systems and applications, including driving issue resolution for mission-critical systems.
Must have experience working to automate, operationalize, and improve Development/QA using CI/CD tools (GitLab, GitHub, Jenkins, Maven, Gradle, Nexus).
Working experience with Software Release Management.

Desired Qualification

BS degree in Computer Science or a related technical field, or equivalent practical experience.

Minimum Experience

3 years of related DevOps or SysOps engineering experience with a focus on major cloud platforms (AWS preferred).
2 years of application development experience, including data streaming and deploying/monitoring high availability critical application components.
1 years in a Site Reliability Engineering (SRE) organization preferred.
Overall 4–6 years of experience.

Apply for this job

Receive alerts for other Site Reliability Engineer job openings

Site Reliability Engineer

What are the responsibilities and job description for the Site Reliability Engineer position at Cyberobotix?

Job openings at Cyberobotix

Not the job you're looking for? Here are some other Site Reliability Engineer jobs in the Atlanta, GA area that may be a better fit.

We don't have any other Site Reliability Engineer jobs in the Atlanta, GA area right now.

AI Assistant is available now!