What are the responsibilities and job description for the Site Reliability Engineer position at Galactic Minds INC?
Role : Sr SRE
Location: Atlanta, GA (Day 1 onsite and need local candidate)
Type: Contract
JD
Qualifications
- Extensive experience supporting AWS production systems (EC2, VPC, ALB/NLB, RDS, Lambda, EKS)
- Skilled in incident management and 24x7 production support
- Proficient with monitoring tools: CloudWatch, Dynatrace, Quantum Metrics
- Strong troubleshooting across infrastructure, networking, and application layers
- Familiar with CI/CD pipelines and AWS deployment processes
Responsibilities
- Provide L1/L2 support for AWS applications and infrastructure incidents
- Triage and resolve issues, restore services within SLAs, and escalate code defects with clear diagnostics
- Participate in on-call rotations, major incident bridges, and post-incident reviews
- Analyze defects, configuration issues, and anomalies from monitoring tools or user reports
- Perform regular health checks on AWS services
- Monitor system health using CloudWatch, Dynatrace, Quantum Metrics, and ThousandEyes
- Respond proactively to issues with resource usage, latency, errors, and availability
- Maintain and enhance dashboards for observability