What are the responsibilities and job description for the Remote - Senior Site Reliability Engineer position at Resource Informatics Group?
Site Reliability Engineer
Location: Austin, TX
Duration: Long term
Rates: DOE (best in market for sure)
Prefer locals
Visa: H1B/H1B-T/ EAD/USC
Are we direct with customer: Yes (NO hidden layers)
Turn around for interview and start date 3 weeks
Site Reliability Engineer - will be responsible for ensuring the reliability, availability, performance, and scalability of production systems by applying software engineering practices to infrastructure and operations. Partners with development teams to build resilient, observable, and automated platforms that meet defined service level objectives (SLOs).
QUALIFICATIONS:
- Experience in systems engineering, DevOps, or site reliability engineering roles 12 yrs
- Strong experience with Linux/Unix systems and system internals 10 yrs
- Proficiency in one or more programming/scripting languages (Python, Go, Java, Bash) 8 yrs
- Experience designing and operating highly available, distributed systems 12 yrs
- Strong knowledge of cloud platforms (AWS, or Google Cloud Platform) and cloud-native services 10 yrs
- Experience with containerization and orchestration (Docker, Kubernetes) 8 yrs
- Strong understanding of monitoring, alerting, and logging concepts 8yrs
- Experience defining and managing SLIs, SLOs, and error budgets 8yrs
- Familiarity with incident management, root cause analysis (RCA), and postmortems 8 yrs
- Experience integrating security and compliance into operational workflows 8yrs
- Familiarity with observability tools (Prometheus, Grafana, Application Insights, Datadog, Splunk) 4yrs
- Experience operating 24x7 production environments with on-call rotations - 4yrs
- Experience with chaos engineering and resiliency testing - 4yrs
- Experience with feature flags, canary deployments, and progressive delivery - 4yrs
- Strong documentation skills for runbooks, dashboards, and operational standards - 4yrs