Demo

Senior Site Reliability Engineer

Castleton Commodities International
Stamford, CT Full Time
POSTED ON 5/12/2026
AVAILABLE BEFORE 7/4/2026
The Senior Site Reliability Engineer is responsible for improving the reliability, availability, scalability, and operational excellence of our critical infrastructure platforms and services. This role partners closely with Engineering, Security, and Infrastructure teams to design resilient cloud-native architectures, implement Infrastructure as Code (IaC) and CI/CD standards, and drive measurable reliability outcomes. The Senior Site Reliability Engineer will also lead efforts to define and validate recovery objectives (RTO/RPO), design and implement Business Continuity / Disaster Recovery (BCP/DR) plans, and coordinate structured testing to ensure readiness.

Responsibilities:

Reliability Engineering & Operations

  • Own and improve service reliability through SLO/SLI definition, error budgets, and operational best practices.
  • Design, implement, and maintain observability (monitoring, logging, tracing, alerting) to reduce MTTR and improve proactive detection.
  • Lead incident response practices including on-call improvements, runbooks, post-incident reviews (RCA), and preventative actions.
  • Partner with application teams to improve performance, capacity planning, and resiliency under failure scenarios.

Infrastructure & Cloud Architecture

  • Design and operate highly available, fault-tolerant Cloud architectures (multi-AZ and, where required, multi-region).
  • Implement resilient patterns across compute, storage, networking, and managed services (e.g., autoscaling, load balancing, backups, replication).
  • Drive cloud governance best practices (tagging, account/landing zone patterns, least privilege, guardrails) in partnership with security and platform teams.

Infrastructure as Code (IaC) & DevOps Enablement

  • Build and maintain IaC modules and standards (e.g., Terraform, CloudFormation, CDK) for repeatable, auditable infrastructure delivery.
  • Develop, standardize, and optimize CI/CD pipelines to enable safe, automated deployments (e.g., GitHub Actions, GitLab CI, Jenkins, AWS CodePipeline).
  • Promote DevOps practices: version-controlled infrastructure, automated testing, immutable deployments, and progressive delivery patterns.
  • Establish environment consistency across dev/test/stage/prod and ensure infrastructure drift detection and remediation.

BCP/DR, RTO/RPO Definition & Testing

  • Collaborate with stakeholders to evaluate and define service-level RTO and RPO targets based on business and technical requirements.
  • Design and implement BCP/DR architectures and procedures (backups, restore workflows, replication, failover/failback, data integrity validation).
  • Coordinate and execute structured DR tests (tabletop, simulation, partial failover, full failover) and document outcomes.
  • Maintain DR runbooks, dependency maps, and recovery checklists; drive remediation of gaps identified during testing.
  • Produce metrics and reporting on DR readiness, test results, and continuous improvement actions.

Qualifications:

  • 7 years of experience in SRE, DevOps, Platform Engineering, or Systems Engineering roles supporting production environments.
  • Strong proficiency with observability platforms (e.g., Datadog, Prometheus/Grafana, ELK/OpenSearch, Nagios, Nimsoft, etc).
  • Strong hands-on AWS experience building and operating production systems.
  • Proven expertise with Infrastructure as Code (Terraform and/or CloudFormation/CDK).
  • Strong CI/CD and automation background (pipeline design, deployment strategies, testing automation).
  • Experience defining and validating RTO/RPO, and implementing BCP/DR plans with structured testing.
  • Experience with Kubernetes and auto-scaling container platforms (EKS, ECS, or Kubernetes on-prem).
  • Strong Linux fundamentals, networking concepts (DNS, TCP/IP, load balancing), and troubleshooting skills.
  • Proficiency in at least one scripting/programming language (Python, Go, Bash, or similar).
  • Ability to write clear operational documentation, runbooks, and post-incident reports.
  • Ability to work effectively in a fast-paced, dynamic and high-intensity environment including open-floor plan if applicable to the position, with timely responsiveness and the ability to work beyond normal business hours when required.

Preferred Qualifications:

  • Familiarity with Azure and/or Oracle Cloud (OCI).
  • Familiarity with Service Mesh, API Gateways, and distributed tracing tooling.
  • Familiarity with OpenTelemetry, client instrumentations and collector configurations.
  • Security and compliance familiarity in cloud environments (IAM design, secrets management, audit logging).
  • Experience implementing progressive delivery (blue/green, canary), feature flags, and automated rollback.
  • Relevant certifications (AWS Solutions Architect/DevOps Engineer, Kubernetes CKA/CKAD).
  • Experience with ArgoCD & Karpenter.

Employee Programs & Benefits:

CCI offers competitive benefits and programs to support our employees, their families and local communities. These include:

  • Competitive comprehensive medical, dental, retirement and life insurance benefits
  • Employee assistance & wellness programs
  • Parental and family leave policies
  • CCI in the Community: Each office has a Charity Committee and as a part of this program employees are allocated 2 days annually to volunteer at the selected charities.
  • Charitable contribution match program
  • Tuition assistance & reimbursement
  • Quarterly Innovation & Collaboration Awards
  • Employee discount program, including access to fitness facilities
  • Competitive paid time off
  • Continued learning opportunities

Visit https://www.cci.com/careers/life-at-cci/# to learn more!

Salary.com Estimation for Senior Site Reliability Engineer in Stamford, CT
$138,264 to $161,761
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Senior Site Reliability Engineer?

Sign up to receive alerts about other jobs on the Senior Site Reliability Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$144,264 - $191,312
Income Estimation: 
$140,435 - $166,410
Income Estimation: 
$140,435 - $166,410
Income Estimation: 
$151,875 - $212,356
Income Estimation: 
$169,957 - $202,398
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Castleton Commodities International

  • Castleton Commodities International Stamford, CT
  • The Vice President, Senior Database Administrator is responsible for the strategic leadership, architecture, and operational management of all enterprise d... more
  • 3 Days Ago

  • Castleton Commodities International Houston, TX
  • Our Product Strategy team is hiring a Product Strategist in our Houston, TX or Stamford, CT office. This role will partner with Merchant Operations, Financ... more
  • 10 Days Ago

  • Castleton Commodities International Houston, TX
  • Castleton Commodities International is seeking a Quantitative Power Analyst to provide in-depth analysis of the US energy markets, with a specific focus on... more
  • 12 Days Ago

  • Castleton Commodities International Newburgh, NY
  • Roseton Generating Station is a dual fuel-fired electric generating facility capable of operating on both natural gas and fuel oil. The station supplies po... more
  • 14 Days Ago


Not the job you're looking for? Here are some other Senior Site Reliability Engineer jobs in the Stamford, CT area that may be a better fit.

  • Jobs via Dice Stamford, CT
  • Dice is the leading career destination for tech experts at every stage of their careers. Our client, Fixity Technologies, is seeking the following. Apply v... more
  • 21 Days Ago

  • Jobs via Dice Greenwich, CT
  • Join our dynamic team as a Site Reliability Engineer, where you'll play a crucial role in enhancing the performance and reliability of our hybrid infrastru... more
  • 21 Days Ago

AI Assistant is available now!

Feel free to start your new journey!