What are the responsibilities and job description for the Sr. Site Reliability Engineer-W2 position at Bahwan CyberTek Inc.?
Job Title: Sr. Site Reliability Engineer (SRE)
Duration: 9-12 Months Contract to Hire
Location: Kalamazoo, MI (3 days a week hybrid)
Summary:
The Site Reliability Engineer (SRE) is responsible for improving the reliability, availability, performance, and operability of client’ supported software systems. This role combines software engineering and IT operations to automate operational work, monitor system performance, and reduce toil. The SRE establishes and manages monitoring, alerting, incident response, and problem management practices to ensure applications remain available and performant during updates and failures. The role partners with engineering, architecture, and product teams to define reliability standards and production readiness requirements. SRE is a practical implementation of DevOps focused on maintaining software quality in fast-paced development environments.
Job Description:
Required:
- Strong grounding in SRE/DevOps practices: incident management, blameless postmortems, SLOs/SLIs, error budgets, production readiness.
- Experience building/operating monitoring and alerting, and using logs/metrics to diagnose issues.
- Automation/scripting skills (e.g., Python, PowerShell, Bash) and ability to reduce manual operational work.
- Strong understanding of cloud-based platforms such as Azure DataBricks Unity Catalog, AWS S3 and RDS.
- Strong experience in ETL / ELT work.
- Understanding of CI/CD concepts, safe deployment patterns, rollback strategies, and change risk controls.
Preferred
- Experience with cloud environments and infrastructure-as-code.
- Experience with large datasets (Multi-million row datasets).
- Experience with container orchestration and modern runtime platforms (where applicable).
Experience building dashboards and reliability reporting for executives and delivery teams