What are the responsibilities and job description for the Site Reliability Engineer, Consultant position at Blue Shield of CA?

Your Role

We are seeking an Experienced Site Reliability Engineer (SRE) to lead reliability, scalability, and performance initiatives across our production systems. In this role, you will blend software engineering, automation, and systems operations to ensure that our platforms are resilient, efficient, and continuously improving.
You will be part of a cross-functional team responsible for designing, implementing, and maintaining reliable systems that support millions of requests daily. This position requires a deep understanding of distributed systems, cloud infrastructure, automation, and incident response.

Your Knowledge and Experience

Education & Experience

Requires a Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience); Master's degree a plus.
7 years of experience in building, supporting, and improving production systems and infrastructure.

Cloud Platforms

Minimum 5 years of hands-on experience with Azure, AWS, or GCP.
Demonstrated expertise in virtual machines (VMs), containers, cloud networking, identity and access management (IAM), monitoring, storage, and serverless functions.
Comfortable deploying and managing cloud-native services and infrastructure.

Programming & Scripting

Proficiency in one or more languages such as Python, Go, Java, Bash, PowerShell, or similar.
Ability to write clean, maintainable code for automation and tooling.

Containerization & Orchestration

Experience working with Kubernetes, Docker, and tools like Helm or Red Hat OpenShift.
Familiarity with managing containerized applications in production environments.

Monitoring & Observability

Working knowledge of tools such as Prometheus, Grafana, Datadog, New Relic, ELK Stack, Dynatrace, Splunk, Big Panda, SolarWinds.
Ability to set up dashboards, alerts, and metrics to ensure system health and performance.

CI/CD & Configuration Management

Experience with CI/CD pipelines using tools like Jenkins, GitHub Actions, GitLab CI, Argo CD, Spinnaker.
Familiarity with configuration management tools such as Ansible, Chef, Puppet.

Automation & Emerging Technologies

Understanding of Agentic AI systems and automation frameworks for incident response and infrastructure optimization is a plus.
Interest in exploring intelligent automation to improve reliability and reduce manual toil.

Testing & Deployment Expertise

Experience with chaos engineering tools (e.g., Gremlin, Chaos Monkey) and methodologies.
Hands-on knowledge of Blue/Green and Canary deployment strategies in cloud-native environments.

#LI-EB1

External hires must pass a background check/drug screen. Qualified applicants with arrest records and/or conviction records will be considered for employment in a manner consistent with Federal, State and local laws, including but not limited to the San Francisco Fair Chance Ordinance. All qualified applicants will receive consideration for employment without regards to race, color, religion, sex, national origin, sexual orientation, gender identity, protected veteran status or disability status and any other classification protected by Federal, State and local laws.

Apply for this job

Receive alerts for other Site Reliability Engineer, Consultant job openings

Site Reliability Engineer, Consultant

What are the responsibilities and job description for the Site Reliability Engineer, Consultant position at Blue Shield of CA?

What is the career path for a Site Reliability Engineer, Consultant?

Job openings at Blue Shield of CA

Not the job you're looking for? Here are some other Site Reliability Engineer, Consultant jobs in the Oakland, CA area that may be a better fit.

We don't have any other Site Reliability Engineer, Consultant jobs in the Oakland, CA area right now.

AI Assistant is available now!