Demo

Senior Site Reliability Engineer / HPC - Pre-IPO Tech Leader

Andiamo
York, NY Full Time
POSTED ON 6/1/2026
AVAILABLE BEFORE 8/2/2026
Senior Site Reliability Engineer / HPC – Pre-IPO Tech Leader

About The Role

We are seeking a highly skilled Senior Site Reliability Engineer (SRE) / High-Performance Computing (HPC) Engineer to design, build, and operate the large-scale infrastructure that powers a $2.5B pre-IPO technology company. Our systems run on massive distributed clusters, handling some of the most demanding workloads in cloud, AI, and data-driven computing.

In this role, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical platforms. You will optimize HPC workloads, streamline CI/CD for large-scale clusters, and enable research and product teams to deliver innovations with speed and confidence. This is a hands-on position with the opportunity to influence architecture, lead reliability initiatives, and solve some of the hardest problems in distributed systems and performance engineering.

What You’ll Do

  • Design Reliable Infrastructure: Architect and maintain large-scale, distributed HPC and cloud-native systems with a focus on uptime, scalability, and resilience.
  • Optimize HPC Workloads: Tune scheduling, job orchestration, and performance for compute- and memory-intensive workloads (AI/ML, simulations, large-scale analytics).
  • Build Observability: Implement monitoring, logging, and alerting systems that provide full visibility into cluster and service health.
  • Automate Everything: Develop tooling and automation for provisioning, scaling, and recovery of critical systems.
  • Ensure Security & Compliance: Implement best practices for access control, encryption, and governance across HPC and cloud environments.
  • Collaborate Cross-Functionally: Work with engineering, research, and product teams to deliver reliable infrastructure for next-gen applications.
  • Incident Response: Lead troubleshooting, root cause analysis, and postmortems for high-severity incidents.

What We’re Looking For

  • Professional Experience: 7 years in SRE, infrastructure engineering, or HPC roles with a proven track record of supporting large-scale distributed systems.
  • Technical Skills: Expertise in Linux systems, Python or Go, and infrastructure-as-code (Terraform, Ansible, or similar).
  • HPC Expertise: Strong knowledge of job schedulers (Slurm, Kubernetes, or Mesos), workload managers, and parallel/distributed computing.
  • Cloud & Hybrid: Hands-on experience with AWS, GCP, or Azure in combination with on-premises HPC clusters.
  • Observability: Proficiency with monitoring and logging frameworks (Prometheus, Grafana, ELK, OpenTelemetry).
  • Resilience Engineering: Experience with chaos engineering, failure testing, and disaster recovery planning.
  • Collaboration: Strong communication skills and the ability to work with research scientists, engineers, and operations teams.
  • Education: Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.

Why Join

This is an opportunity to join a pre-IPO technology leader valued at $2.5B, at a time of rapid growth and innovation. As a Senior SRE / HPC Engineer, you will shape the infrastructure that powers next-generation AI, analytics, and large-scale computing. You’ll solve some of the most complex reliability and performance challenges, collaborate with world-class teams, and play a key role in preparing the company for IPO and beyond. The scale is massive, the challenges are unique, and your impact will be immediate.

About Andiamo

Talent Partners for the AI Revolution. As a globally recognized staffing and consulting firm, we specialize in placing the top 2% of technology and go-to-market professionals with the world’s largest and most well-known companies.

For over 20 years, we've maintained the status of tier-one vendor for firms such as Palantir, Amazon, Fluidstack, Bloomberg, Relativity Space, Firefly, MasterCard, Visa, Two Sigma, Citadel, as well as other major financial services firms, elite hedge funds, Google-backed tech start-ups, and major software firms.

Our talent solutions include Permanent Placement, Contract Staffing, Executive Search, and Dedicated Recruiting Services (RPO). Find out more at www.andiamogo.com

Salary.com Estimation for Senior Site Reliability Engineer / HPC - Pre-IPO Tech Leader in York, NY
$123,644 to $144,515
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Senior Site Reliability Engineer / HPC - Pre-IPO Tech Leader?

Sign up to receive alerts about other jobs on the Senior Site Reliability Engineer / HPC - Pre-IPO Tech Leader career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$144,264 - $191,312
Income Estimation: 
$140,435 - $166,410
Income Estimation: 
$169,957 - $202,398
Income Estimation: 
$151,875 - $212,356
Income Estimation: 
$120,143 - $165,703
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Andiamo

  • Andiamo Tempe, AZ
  • Technical Support Representative (Contract to Hire) — Hybrid in Tempe, AZ Join a high performing IT support team that keeps a large, fast paced organizatio... more
  • Just Posted

  • Andiamo Nashville, TN
  • Accounts Receivable Specialist — Global Collections & Financial Operations Drive cash flow, strengthen client relationships, and bring clarity to complex f... more
  • Just Posted

  • Andiamo Seattle, WA
  • Data Scientist About The Role When you work in this role, you’ll tackle tough problems alongside other scientists and engineers—people who will challenge y... more
  • Just Posted

  • Andiamo Boston, MA
  • Software Engineer III – Large-Scale Systems About The Role We are seeking a highly skilled Software Engineer III to design, build, and scale next-generatio... more
  • Just Posted


Not the job you're looking for? Here are some other Senior Site Reliability Engineer / HPC - Pre-IPO Tech Leader jobs in the York, NY area that may be a better fit.

  • Andiamo York, NY
  • Senior DevOps Engineer – Pre-IPO Tech Leader Senior DevOps Engineer – Pre-IPO Tech Leader About The Role We are seeking a highly skilled Senior DevOps Engi... more
  • 2 Days Ago

  • Andiamo York, NY
  • Machine Learning Engineer – Pre-IPO Tech Company About The Role We are looking for a highly skilled Machine Learning Engineer to help design, build, and sc... more
  • 3 Days Ago

AI Assistant is available now!

Feel free to start your new journey!