Full Time | Education & Training Services1 Week Ago
Save
Sorry! This job is no longer available. Please explore similar jobs listed on the left.
Chabez Tech is Hiring a Kubernetes Site Reliability Engineer (Java background) - Onsite in Atlanta Near Atlanta, GA
Job Details
Role: Kubernetes Site Reliability Engineer (Java background) - Location - Atlanta or Frisco, TX Onsite Duration: Long Term
Site Reliability Engineer, ACE Platform Engineering will support critical API Platform, devops and other activities for the Digital Services Group.
Job Description: Apply to monitor and create complex alerts and dashboards for production systems. Provide capacity analysis, tuning analysis for Cloud applications in a LINUX and container platform. Available to provide 24X7 on call support on a rotating basis with other team members. Lead efforts in troubleshooting, recovery, and root cause investigation. Perform analysis of user requirements and problems to automate or improve systems and review system capabilities, workflow, and scheduling limitations. Able to follow and develop detailed work plans, schedules, project estimates, resource plans, and status reports. Facilitate DR (Disaster Recovery) exercises to ensure that the team are fully prepared in any event. Lead root cause analysis session to understand what causes issues in Production and come up with solutions that will prevent them from happening in the future. Ensure documentation is created and remains updated for any related work. Strong understanding of UNIX operating systems and any scripting language. Evaluates product and service solutions.
Skill requirements: Strong hands-on experience in Kubernetes, infrastructure and support. Strong experience in DevOps Practice for Micro Services using Kubernetes as Orchestrator. Strong experience with Cloud configurations, services
Strong experience in API microservices
Experience with tools like: NGINX, Docker, PostMan, SOAP UI, ELK, Splunk, App Dynamics, CI/CD tools and GITLab Good Experience in performance measures and tuning, capacity planning and management, contingency and disaster recovery Strong scripting knowledge and experience. Good understanding of networking and routing.