What are the responsibilities and job description for the Site Reliability Engineer (SRE) / DevOps Engineer position at CyLogic?
We are seeking a Site Reliability Engineer / DevOps Engineer to support the build, operation, and continuous improvement of reliable and scalable platforms. This role focuses on hands-on execution, operational support, and automation, working closely with senior engineers and platform teams.
You will contribute to improving system performance, deploying applications, and maintaining observability and CI/CD processes while developing deeper expertise across infrastructure and engineering practices.
Responsibilities/Duties:
- Support infrastructure implementation using Infrastructure as Code (IaC) tools
- Assist in building and maintaining CI/CD pipelines (e.g., Jenkins)
- Execute automation tasks for provisioning, deployment, and configuration
- Contribute to improving existing automation workflows
- Support and operate monitoring and logging systems using OpenSearch, Elasticsearch, and related tools
- Build dashboards and alerts using predefined standards
- Participate in incident response and troubleshooting efforts
- Assist with root cause analysis and documentation
- Deploy and manage containerized applications using Docker and Kubernetes
- Perform routine operational tasks such as scaling, monitoring, and updates
- Troubleshoot issues in container environments
- Use Terraform to support infrastructure provisioning
- Work with Vault for secrets access and usage
- Assist in maintaining consistency across environments
- Build and maintain dashboards using Grafana
- Support logging and telemetry pipelines for observability
- Assist in implementing APM solutions under guidance
- Other duties as assigned
Experience and Core Competencies:
- 3–5 years of experience in DevOps, SRE, or related roles
- Experience working with CI/CD tools such as Jenkins
- Familiarity with Elasticsearch / OpenSearch platforms
- Hands-on experience with Docker and Kubernetes
- Working knowledge of Terraform and basic HashiCorp tooling
- Experience with Grafana, logging, and monitoring systems
- Basic scripting knowledge (Python, Bash, or similar)
- Understanding of Linux systems administration
- Exposure to cloud platforms (AWS, Azure, or GCP)
- Exposure to SRE practices (incident response, monitoring)
- Familiarity with distributed tracing concepts
- Understanding of application deployment workflows
- Interest in automation and reliability engineering
- Strong troubleshooting and analytical skills
- Ability to follow established processes and standards
- Willingness to learn and grow in a fast-paced environment
- Team-oriented and collaborative mindset
Physical Requirements
- Lifting Up to 50 pounds
- Frequent walking, standing, bending, sitting.