Demo

Senior Site Reliability Engineer

Gradle Inc.
San Francisco, CA Full Time
POSTED ON 12/16/2025
AVAILABLE BEFORE 2/16/2026
Who We AreDevelocity is a first-of-its-kind toolchain observability and acceleration platform that helps software teams adopt and improve DORA capabilities (including continuous delivery) in order to achieve software delivery excellence. It combines build and test acceleration with deep observability for builds and tests with Gradle Build Tool, Apache Maven™, sbt, npm, and Python, and applies to both CI and local builds and tests. Ultimately, Develocity provides an operational layer across an organization's toolchains to speed up, troubleshoot, and optimize local developer and remote CI feedback loops.Our software is used by some of the world's leading software organizations, such as Netflix, Airbnb, SAP, several top ten banks, and many other major customers across all verticals. We regularly collaborate with these and other users to make our products continuously better.We have partnered with the Apache Software Foundation, the Commonhaus Foundation, the Scala Center, the Micronaut Foundation, and other OSS projects like Spring, Quarkus, Kotlin, JUnit, AndroidX, and many more to bring the values of Develocity also to the OSS Community.Our ValuesSeek to Understand: Everything starts with listening and understanding, and we strive to understand different viewpoints, problems, and motivations. Before we take action, we ensure we truly grasp the challenges, perspectives, and goals. Know the Why: We approach our work with a clear sense of purpose, ensuring every step is deliberate and focused. We take meaningful action with urgency, but never at the expense of thoughtful consideration. Innovate & Iterate: We embrace challenges and are not afraid to try new things, even if they might fail. With deep understanding and a clear purpose, we can develop creative and bold solutions to tackle challenges.Own the Outcome: We are empowered to take initiative and we maintain transparency in our work and its outcomes. When we execute, we take responsibility for our decisions, measure the success of our innovations, and learn from the results.Who You AreWe're building a new SRE team and looking for founding members to help shape how we operate. You'll be responsible for the reliability, performance, and availability of Develocity instances serving paying customers, open-source projects, and public-facing services, plus supporting infrastructure like artifact registries.You'll work on our internally-built Cloud Application Platform, Kubernetes on AWS, and develop deep expertise in it. When incidents happen, you'll troubleshoot issues across the stack, from application to infrastructure. You'll collaborate with the Cloud Platform team to improve the tooling you depend on, and with engineering teams to build reliability into how we ship software. If you like automating things and hate doing the same task twice, you'll fit in well.You'll be part of a distributed, remote-first team that values asynchronous communication and written documentation. Strong self-direction and clear communication across time zones are essential.ResponsibilitiesOperate and maintain all Develocity instances and supporting services.Participate in a follow-the-sun on-call rotation, owning incident response and troubleshooting issues across the stack.Drive automation across application deployment, upgrades, monitoring, self-healing, and recovery.Build and maintain observability for all managed services (logging, metrics, tracing, and alerting).Work with engineering teams to build reliability into features from the start.Run incident response and retrospectives, and make sure we learn from them.Own disaster recovery, backups, and business continuity.Communicate with customers during incidents and maintenance windows.Optimize performance, resource usage, and costs.Help evolve our SaaS operations as we grow.Minimum qualifications5 years in SRE, DevOps, or equivalent role operating production services at scale.Strong Kubernetes experience in production environments.Cloud infrastructure expertise, preferably AWS (EKS, RDS, S3, EC2).Proficiency with observability tools (Prometheus, Grafana) and Infrastructure as Code (Terraform).Track record of incident management and response.Knowledge of SRE best practices (SLAs, SLOs).Scripting proficiency (Python, Bash) for automation.Experience with 24/7 on-call rotations.Strong written and verbal English communication.Preferred qualificationsExperience operating SaaS platforms at scale.Familiarity with Develocity.JVM language experience (Java, Kotlin).Disaster recovery planning and execution experience.Customer-facing incident communication skills.Experience establishing SRE practices in new or growing teams.What We OfferA ground-floor role in a new SRE team—you'll shape how we do things, not inherit someone else's decisions.Real ownership of production systems used by engineers at companies you've heard of.Direct interaction with customers when things go wrong (and when they go right).A culture that values automation over heroics.In-person meetings, such as our annual company offsite and team meetings.Work from home in a remote-first environment.Competitive salaries and equity grants.CompensationThe US salary range for this position is $150-190k which reflects the target ranges for all US locations. Within this range, individual pay is determined by geographic location and additional factors including but not limited to experience, relevant skills, qualifications, seniority, performance, and travel requirements. Our recruiting team can share more information about the specific salary range for your location during the hiring process.LocationRemote from anywhere in PST timezone.While our team works remotely and is spread across the globe, we deeply value daily interactions and collaboration.

Salary : $67 - $89

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Senior Site Reliability Engineer?

Sign up to receive alerts about other jobs on the Senior Site Reliability Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$144,264 - $191,312
Income Estimation: 
$140,435 - $166,410
Income Estimation: 
$92,369 - $122,605
Income Estimation: 
$117,024 - $149,811
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Not the job you're looking for? Here are some other Senior Site Reliability Engineer jobs in the San Francisco, CA area that may be a better fit.

  • Crusoe San Francisco, CA
  • Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitious... more
  • 1 Month Ago

  • Citizen Health San Francisco, CA
  • Who We Are Citizen Health was founded on the belief — shaped by firsthand lived experiences navigating healthcare — that having the right advocate is the s... more
  • 18 Days Ago

AI Assistant is available now!

Feel free to start your new journey!