What are the responsibilities and job description for the Site Reliability Engineer (SRE) position at innovitusa?
Hiring: W2 Candidates OnlyVisa: Open to any visa type with valid work authorization in the USA We are seeking a highly skilled Site Reliability Engineer (SRE) to build, scale, and maintain our production infrastructure. The ideal candidate blends software engineering expertise with strong operational discipline. You will ensure the reliability, availability, security, and performance of our cloud-based systems while driving automation and continuous improvement across engineering teams. Key ResponsibilitiesDesign, build, and manage highly scalable and reliable infrastructure across cloud environments (AWS/Azure/GCP).Develop automation for deployment, monitoring, scaling, and recovery using tools such as Terraform, Ansible, Helm, or CloudFormation.Implement CI/CD pipelines and partner with development teams to enhance deployment velocity and operational stability.Monitor system performance using tools like Prometheus, Grafana, Datadog, ELK Stack, or CloudWatch.Perform incident response, root cause analysis (RCA), and postmortems to ensure continuous improvement.Build and maintain robust alerting systems and SLO/SLIs to uphold service-level reliability targets.Improve system resilience through capacity planning, chaos engineering, fault-tolerance testing, and disaster recovery strategies.Maintain and enhance security posture, ensure compliance, and enforce operational best practices.Manage containers and orchestration platforms such as Docker and Kubernetes at scale.Collaborate with cross-functional teams to drive reliability, performance tuning, and cost optimization. Required Skills & QualificationsBachelor’s degree in Computer Science, Engineering, or a related technical field.4-8 years of SRE, DevOps, or Cloud Engineering experience.Strong proficiency in cloud platforms: AWS, Azure, or GCP.Expertise with infrastructure-as-code tools (Terraform, CloudFormation, Pulumi, Ansible).Hands-on experience with Kubernetes, Docker, and container orchestration.Strong scripting/programming skills in Python, Go, Bash, or similar.Solid understanding of networking fundamentals (DNS, TCP/IP, Load Balancing, VPC).Experience with monitoring, log management, and observability tools.Strong problem-solving, debugging, and troubleshooting skills in large-scale distributed systems.Good communication skills and ability to work in fast-paced, collaborative environments. Preferred QualificationsExperience supporting microservices-based architectures.Knowledge of serverless technologies (Lambda, GCP Cloud Functions, Azure Functions).Experience with GitOps tools (ArgoCD, Flux).Background in security hardening, compliance, or cloud architecture.Familiarity with chaos engineering tools (Gremlin, LitmusChaos).Experience in on-call rotations with strong incident management skills
Salary : $58 - $77