Demo

Principal Site Reliability Engineer

Crusoe
San Francisco, CA Full Time
POSTED ON 12/13/2025
AVAILABLE BEFORE 2/13/2026
Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability.Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure.About This RoleAs a Principal Site Reliability Engineer, you will play a critical role in designing and operating a next-generation NeoCloud built for AI, GPU, and high-performance workloads. This role sits at the intersection of infrastructure architecture, reliability engineering, and technical leadership. You’ll set reliability strategy, influence platform design, and ensure the cloud scales safely, efficiently, and predictably as customer demand accelerates.You are a hands-on technical leader who thrives in complex distributed systems, drives clarity in ambiguous environments, and raises the bar for operational excellence across the organization.What You’ll Be Working OnDefine and own the reliability architecture for a NeoCloud platform supporting GPU-dense, latency-sensitive, and large-scale distributed workloadsDesign and evolve SLOs, SLIs, and error budgets that meaningfully balance reliability, velocity, and customer experienceLead incident response strategy for high-severity events, including root cause analysis and long-term remediationArchitect and improve observability systems (metrics, logs, tracing) to support rapid detection and diagnosis at scalePartner with Infrastructure, Networking, Hardware, and Platform teams to influence system design before production issues occurDrive automation across provisioning, deployment, capacity management, and failure recoveryEstablish best practices for on-call health, operational readiness, and production change managementServe as a technical authority and mentor for senior and staff-level engineers across the SRE and infrastructure orgWhat You’ll Bring to the Team10 years of experience operating and scaling large-scale distributed systems in production environmentsDeep expertise in SRE principles: reliability modeling, incident management, toil reduction, and systems thinkingStrong background in cloud or infrastructure platforms (public cloud, private cloud, or NeoCloud environments)Hands-on experience with Kubernetes and containerized workloads at scaleProficiency in one or more programming languages (Go, Python, Rust, or similar) with production-grade code ownershipStrong understanding of Linux systems, networking fundamentals, and performance bottlenecksProven ability to lead through influence — setting direction across teams without direct authorityExceptional communication skills, especially during high-stakes incidents and cross-functional decision-makingBonus PointsExperience supporting GPU-based, AI/ML, or HPC workloadsFamiliarity with bare-metal provisioning, hardware lifecycle management, or data center operationsExperience building or scaling a NeoCloud or cloud-adjacent platform from early growth to maturityBackground in capacity planning for GPU, storage, or high-throughput networking environmentsPassion for sustainable infrastructure or next-generation cloud architecturesBenefits:Industry competitive payRestricted Stock Units in a fast growing, well-funded technology companyHealth insurance package options that include HDHP and PPO, vision, and dental for you and your dependentsEmployer contributions to HSA accountsPaid Parental LeavePaid life insurance, short-term and long-term disabilityTeladoc401(k) with a 100% match up to 4% of salaryGenerous paid time off and holiday scheduleCell phone reimbursementTuition reimbursementSubscription to the Calm appMetLife LegalCompany paid commuter benefit; $300 per monthCompensation:Compensation will be paid in the range of $261,000 - $326,000 Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant’s education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

Salary : $67 - $89

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Principal Site Reliability Engineer?

Sign up to receive alerts about other jobs on the Principal Site Reliability Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$76,670 - $90,826
Income Estimation: 
$91,609 - $118,978
Income Estimation: 
$92,877 - $110,401
Income Estimation: 
$92,877 - $110,401
Income Estimation: 
$120,933 - $155,034
Income Estimation: 
$114,618 - $136,401
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Crusoe

  • Crusoe Tulsa, OK
  • Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitious... more
  • 13 Days Ago

  • Crusoe Amarillo, TX
  • Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitious... more
  • 13 Days Ago

  • Crusoe San Francisco, CA
  • Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitious... more
  • 13 Days Ago

  • Crusoe San Francisco, CA
  • Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitious... more
  • 13 Days Ago


Not the job you're looking for? Here are some other Principal Site Reliability Engineer jobs in the San Francisco, CA area that may be a better fit.

  • Early Warning® San Francisco, CA
  • At Early Warning, we’ve powered and protected the U.S. financial system for over thirty years with cutting-edge solutions like Zelle®, Paze℠, and so much m... more
  • 9 Days Ago

  • Roblox San Mateo, CA
  • Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences– all cre... more
  • 16 Days Ago

AI Assistant is available now!

Feel free to start your new journey!