Demo

Senior Site Reliability Engineer

Oracle
Pleasanton, CA Full Time
POSTED ON 6/25/2026
AVAILABLE BEFORE 7/23/2026
Job Description

We are looking for a Site Reliability Engineer 3 to support mission-critical cloud services and production operations. The role focuses on improving service reliability, reducing operational risk, automating repetitive tasks, and driving faster detection and resolution of issues.

The engineer will work closely with development, infrastructure, security, and operations teams to monitor service health, troubleshoot production issues, participate in incident response, improve observability, and implement reliability best practices. This role also includes analyzing recurring failures, building automation, supporting deployments, and contributing to capacity planning, disaster recovery, and operational readiness.

Also works on number of different region/realm rollouts, deployments. Forecasts demands and responds to capacity needs. Collaborates with software development teams to develop reliable and scalable infrastructures. Performs data collection to maintain and optimize operations and reliability. Leverages knowledge to perform incident response and/or maintenance tasks. Provides health and performance reporting. Identifies opportunities for automation. Communicates about services and identifies and explains the potential impact of changes. Provides support for technology and document incidents. Experiments with new tools and assesses potential impact and develops knowledge of site reliability trends.

Responsibilities

Key Responsibilities

Capacity Ingestion and Management:

  • Takes proactive steps to design and architect infrastructure and/or service according to terms for reliability and functionality.
  • Forecasts demands for infrastructure and responds to capacity needs, ensuring systems have sufficient resources to handle current and future workloads.
  • Collaborates with the software development team to develop infrastructures and features that are reliable and scalable according to deployment requirements.
  • Independently identifies opportunities for and drives prototyping (e.g., testing new applications or infrastructures, assisting in onboarding).


Incident and Service Lifecycle Management:

  • Performs data collection, triage, technical analysis, and redirection to maintain and optimize operations and infrastructure reliability.
  • Independently monitors services, maintains up-to-date knowledge of their performance, and documents their condition.
  • Leverages comprehensive knowledge to perform incident response, root cause analyses, and/or maintenance on assigned services (e.g., software installs, version upgrades, security updates, backup and recovery).
  • Provides health and performance reporting and takes appropriate actions based on trends in data.
  • May independently perform provisioning to support infrastructure, applications, and services.
  • May perform standard and non-standard decommissioning (e.g., shutting down servers, removing data from databases) to remove objects that are no longer needed.


Automation:

  • Identifies opportunities for automation and assesses potential benefits.
  • Develops automation tools or scripts to provide solutions, gather metrics, monitor, analyze, mitigate, or remediate issues/defects within infrastructures.
  • Independently conducts testing to ensure automation performs the task correctly and produces expected results.


Technical Communication and Guidance:

  • Communicates the scale, capacity, security, performance attributes, and requirements of services and technology within and sometimes beyond immediate team.
  • Identifies and explains the potential impact of infrastructure, feature, and tool changes, considering their impact on team operations.


Troubleshooting and Resolution:

  • Provides operational support for technology, escalating incidents and other standard and non-standard issues arising within Oracle services.
  • Participates in on-call shifts to address issues.
  • Resolves technical issues spanning various services, investigating and debugging products in order to reach SLOs (service level objectives).
  • Documents incidents and performs root cause analyses according to standard reporting methods.
  • Independently performs post-mortem procedures to prevent incident reoccurrence.


Innovation and Improvement:

  • Experiments with new tools and technologies to assess their potential impact on and improve infrastructure performance and reliability, ensuring adherence to security standards.
  • Independently identifies and executes improvements for performance bottlenecks and deployments to ensure efficient resource usage, speed, and scalability.
  • Develops knowledge of site reliability trends and shares new information with team members, management, and beyond to help others build, test, deploy and run services.
  • Performs standard and non-standard analyses and provides clear data on production to contribute to business development decisions (e.g., design changes).


Core Responsibilities

Planning & Execution:

Independently manages work, monitoring timelines and deliverables to ensure projects or initiatives stay on track and meet requirements. Proactively prioritizes work and adapts to resource or timeline shifts, suggesting adjustments to maintain project efficiency.

Collaboration & Partnership:

Collaborates across teams to align on expectations and achieve shared objectives. Builds and maintains a comprehensive understanding of business, stakeholder, and/or customer needs to build and support effective partnerships. Actively listens to diverse perspectives and asks questions to ensure understanding of others.

Problem Solving:

Independently identifies and addresses standard and non-standard issues in accordance with standard practices, escalating more complex issues as appropriate. Analyzes data and/or information from multiple sources to troubleshoot standard and non-standard errors. Contributes to knowledge sharing and best practices.

Continuous Learning:

Embraces continuous learning by actively seeking to build knowledge and new skills and/or tools and staying current with industry trends and best practices. Seeks out and leverages feedback and training to improve skills. Contributes to a culture of continuous learning and knowledge sharing with team members.

Continuous Improvement:

Develops ideas and recommends updates to increase the efficiency and effectiveness of processes, protocols, and workflows within a team. Seeks input from team members on alternative approaches and methods for improving work.

IAC: Terraform, Chef, Ansible

Languages: Python, Java, Bash

Orchestration: Kubernetes, Helm

CI/CD: Jenkins

Observability: Grafana, Prometheus

Qualifications

Minimum Job Qualifications

Education and/or Experience:

8 years of experience in software engineering, infrastructure management, or related field

OR

Bachelor's Degree in Computer Science, Engineering, or related field AND 4 years of experience in software engineering, infrastructure management, or related field

OR

Master's Degree in Computer Science, Engineering, or related field AND 2 year of experience in software engineering, infrastructure management, or related field.

OR

Doctorate in Computer Science, Engineering, or related field

Job Skills:

Same skills as prior level plus;

Operating Systems Demonstrated ability in or knowledge of operating systems, including installing, upgrading, and troubleshooting various operating environments.

Automation Experience:

3 years of experience in automation.

Programming Experience:

3 years of experience in programming and/or scripting.

Preferred Job Qualifications

Education and/or Experience:

9 years of experience in software engineering, infrastructure management, or related field

OR

Bachelor's Degree in Computer Science, Engineering, or related field AND 5 years of experience in software engineering, infrastructure management, or related field

OR

Master's Degree in Computer Science, Engineering, or related field AND 3 years of experience in software engineering, infrastructure management, or related field

OR

Doctorate in Computer Science, Engineering, or related field AND 1 year of experience in software engineering, infrastructure management, or related field.

Automation Experience:

5 years of experience in automation.

Programming Experience:

5 years of experience in programming and/or scripting.

About Us

Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives.

True innovation starts when everyone is empowered to contribute. That’s why we’re committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.

We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing accommodation-request_mb@oracle.com or by calling 1-888-404-2494 in the United States.

Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

Salary.com Estimation for Senior Site Reliability Engineer in Pleasanton, CA
$139,735 to $163,324
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Senior Site Reliability Engineer?

Sign up to receive alerts about other jobs on the Senior Site Reliability Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$144,264 - $191,312
Income Estimation: 
$140,435 - $166,410
Income Estimation: 
$76,670 - $90,826
Income Estimation: 
$91,609 - $118,978
Income Estimation: 
$92,877 - $110,401
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Oracle

  • Oracle CIUDAD DE MEXICO, DE
  • Cloud Adoption Engineer Descripción del cargo Buscamos un perfil con experiencia en arquitectura y delivery/implementación de soluciones en la nube. EL obj... more
  • Just Posted

  • Oracle Washington, DC
  • Job Description Are you a dynamic Technology Cloud Sales Professional ready to make an impact in the Intelligence Community by driving mission enabling res... more
  • Just Posted

  • Oracle Kansas, KS
  • Job Description We are hiring a SR Configuration Analyst. This Intermediate position will require solution knowledge in Supply Chain and cross train in the... more
  • Just Posted

  • Oracle Denver, CO
  • Job Description NetSuite's Private Equity Practice partners with private equity firms to help accelerate value creation across their portfolio companies th... more
  • Just Posted


Not the job you're looking for? Here are some other Senior Site Reliability Engineer jobs in the Pleasanton, CA area that may be a better fit.

  • Candidate Experience site Sunnyvale, CA
  • Join Fortinet, a cybersecurity pioneer with over two decades of excellence, as we continue to shape the future of cybersecurity and redefine the intersecti... more
  • 9 Days Ago

  • Triunity Software, Inc. Fremont, CA
  • Note: We would prefer local professionals from the area, as there may be a need for an in-person meeting during the final round of discussions. Role Summar... more
  • 4 Days Ago

AI Assistant is available now!

Feel free to start your new journey!