What are the responsibilities and job description for the "Site reliability engineering" OR SRE position at ZODHA SOLUTIONS?
Site Reliability Engineering (SRE)
Introduction:
As a Site Reliability Engineer (SRE) at our company, you will play a crucial role in ensuring the reliability and performance of our systems and applications. You will work closely with cross-functional teams to design, build, and maintain scalable and reliable infrastructure to support our products and services.
Responsibilities:
- Design, implement, and maintain monitoring and alerting systems to ensure the availability and performance of our services
- Automate repetitive tasks to streamline operations and improve efficiency
- Collaborate with software engineers to optimize application performance and reliability
- Participate in on-call rotation to respond to incidents and troubleshoot issues in a timely manner
- Conduct post-incident reviews and implement preventive measures to prevent future outages
- Stay up-to-date with industry best practices and technologies to continuously improve our systems
Requirements:
Required Skills:
- Minimum 14 years of experience in Site Reliability Engineering
- Minimum 8 years of experience working with AWS services
- Minimum 5 years of experience using Dynatrace for monitoring and performance management
- Minimum 3 years of experience with Open Telemetry for observability and monitoring
Preferred Skills:
- Experience with containerization technologies such as Docker and Kubernetes
- Strong programming skills in languages like Python, Java, or Go
- Certifications in AWS or other cloud platforms
- Excellent problem-solving and communication skills