What are the responsibilities and job description for the Observability / SRE Engineer position at Mpower Plus Rezolve AI Group LTD?
Observability / SRE Engineer
Introduction:
The Observability / SRE Engineer will play a key role in ensuring the reliability and performance of our systems by implementing logging, monitoring, and alerting solutions. This individual will work closely with the engineering and operations teams to proactively identify and address issues to maintain seamless operations.
Responsibilities:
- Implement and manage logging, monitoring, and alerting systems to track system performance and detect anomalies.
- Develop and maintain audit logging processes to ensure compliance with security and regulatory standards.
- Create and maintain operational monitoring and traceability tools to identify and troubleshoot issues.
- Collaborate with cross-functional teams to optimize system performance and reliability.
- Automate monitoring and alerting processes to streamline operations and improve efficiency.
- Conduct regular performance reviews and make recommendations for improvements based on data analysis.
- Participate in on-call rotation to respond to system alerts and incidents in a timely manner.
Requirements:
Required Skills:
- Expertise in logging, monitoring, and alerting systems.
- Experience with audit logging processes and tools.
- Strong understanding of operational monitoring and traceability best practices.
- Ability to work collaboratively in a fast-paced environment.
- Excellent problem-solving and communication skills.
Preferred Skills:
- Experience with cloud-based monitoring solutions.
- Knowledge of DevOps principles and practices.
- Certifications in relevant technologies (e.g., AWS Certified DevOps Engineer).
- Experience with scripting languages (e.g., Python, Bash).
Salary : $100,000 - $120,000