What are the responsibilities and job description for the Java-SRE engineer position at Atos?
- System Reliability & Performance
- Ensure applications built in Java run reliably, with minimal downtime.
- Define and monitor SLIs (Service Level Indicators), SLOs (Service Level Objectives), and error budgets to measure reliability.
- Incident Management
- Troubleshoot production issues, perform root cause analysis, and implement permanent fixes.
- Automate repetitive operational tasks to reduce manual intervention.
- Monitoring & Observability
- Build dashboards using tools like Prometheus, Grafana, ELK stack.
- Implement logging and tracing for distributed Java applications.
- Automation & CI/CD
- Develop scripts and pipelines for deployment, scaling, and rollback.
- Integrate with DevOps practices for continuous delivery.
- Collaboration
- Work closely with developers to design reliable systems.
- Partner with operations teams to ensure smooth deployments and upgrades.
Required Skills
- Programming: Strong expertise in Java,
- Cloud Platforms: AWS, Azure, GCP; Kubernetes and Docker for container orchestration.
- DevOps Tools: Jenkins, GitHub Actions, Terraform, Ansible, Chef.
- Monitoring & Observability: Prometheus, Grafana, ELK, Jaeger.
- OS & Networking: Linux administration, DNS, load balancing, distributed systems.
- Soft Skills: Problem-solving, communication, collaboration across dev and ops teams.