Demo

Cloud Operations/Platform Manager

ALTEN
Alpharetta, GA Full Time
POSTED ON 1/8/2026
AVAILABLE BEFORE 3/7/2026
Job Description - Manager, Cloud/Platform Operations

Location: Atlanta / Roswell, GA (Onsite with Offshore Team Management)

Role Summary

Rheem is seeking a Manager, Cloud Operations to lead, transform, and scale its digital operations landscape across CloudOps, SRE, NOC, Observability, AIOps, and MLOps. This individual will serve as the single point of accountability for operational stability and innovation, managing offshore teams while working closely with Rheem's U.S. digital leadership.

This role is not a steady-state manager position. The successful candidate will:

  • Identify operational gaps.
  • Suggest and implement best practices and tools.
  • Introduce automation and innovation strategies.
  • Guide daily deliverables for offshore teams.
  • Demonstrate tangible business impact each quarter (improved uptime, reduced MTTR, cost savings, predictive alerting, etc.).

The Manager will report to the Director of Digital Operations and act as Rheem's Cloud Operations Leader in practice.

Key Responsibilities

Operations Strategy & Governance

  • Define the vision, strategy, and roadmap for CloudOps, Reliability, and Operational Excellence.
  • Establish KPIs and OKRs aligned with Rheem's business goals (availability, MTTR, cloud cost per device, customer churn reduction).
  • Deliver quarterly impact reports to business leadership showcasing operational improvements and ROI.

Cloud Operations & FinOps

  • Own multi-region cloud operations across AWS and Azure platforms.
  • Drive cost transparency and optimization via FinOps practices and dashboards.
  • Build capacity and resiliency models for predictable operations.
  • Conduct resiliency drills and game days to ensure high availability and compliance.

Site Reliability Engineering (SRE)

  • Establish SLIs, SLOs, and error budgets to measure reliability.
  • Build incident management playbooks and drive blameless postmortems.
  • Proactively improve reliability through automation, self-healing, and continuous testing.

Network Operations Center (NOC) Modernization

  • Transform NOC from alert-driven to predictive, AIOps-enabled operations.
  • Consolidate monitoring tools and reduce alert fatigue with intelligent correlation.
  • Ensure 24x7 support coverage through offshore team alignment and escalation management.

Observability & Telemetry

  • Build a unified observability stack (logs, metrics, traces, RUM) leveraging OpenTelemetry.
  • Enable business-oriented dashboards (device uptime, customer adoption, churn trends).
  • Ensure end-to-end visibility from connected devices → cloud microservices → customer-facing apps.

AIOps & MLOps [optional]

  • Deploy AIOps solutions for anomaly detection, predictive alerts, event correlation, and automated remediation.
  • Operationalize ML models: rollout, monitoring, drift detection, rollback strategies.
  • Showcase measurable value, e.g., warranty claim reduction, improved customer experience metrics.

Process Innovation & Automation

  • Audit current toolchain and processes; identify redundancies, gaps, and opportunities for automation.
  • Align with DevOps/SecOps to streamline release-to-operations handshakes.
  • Drive Infrastructure-as-Code for operations (Terraform, Ansible, GitOps).

Team Leadership & Offshore Management

  • Manage and mentor a distributed team (offshore onsite), setting clear goals and accountability.
  • Define roles, responsibilities, and shift structures for 24x7 global coverage.
  • Build a culture of continuous improvement and operational excellence.

Compliance, Security & Risk

  • Ensure Rheem operations align with compliance standards (SOC2, ISO, HIPAA where applicable).
  • Own business continuity planning and disaster recovery testing.
  • Proactively identify operational risks and mitigate before they impact business.

Business Alignment & Change Leadership

  • Act as the voice of operations at business leadership tables.
  • Translate technical improvements into business outcomes (lower churn, improved uptime, faster installs, fewer complaints).
  • Champion a quarterly innovation agenda to showcase improvements in uptime, cost, and reliability.

Qualifications

Must-Have

  • Experience & Leadership
    • 10 years of experience in Cloud Operations, Site Reliability Engineering, or Digital Operations.
    • Proven track record of owning operational outcomes (uptime, MTTR, cost optimization, observability).
    • Experience managing offshore/global delivery teams with 24x7 coverage.
    • Strong leadership presence — able to act as a change agent, operate autonomously, and deliver measurable outcomes without day-to-day direction.
  • Cloud & Technical Expertise
    • Hands-on experience with AWS and/or Azure (multi-account, multi-region operations).
    • Solid expertise with observability & monitoring tools (Datadog, Dynatrace, Splunk, Grafana, Prometheus, ELK/EFK).
    • Familiarity with Infrastructure-as-Code (Terraform, Ansible, GitOps).
    • Strong understanding of SRE principles (SLIs, SLOs, error budgets, incident management frameworks).
  • Process & Governance
    • Demonstrated ability to design and implement operations frameworks (Ops playbooks, NOC modernization, incident command systems).
    • Knowledge of FinOps practices (cloud cost visibility, optimization, showback/chargeback).
    • Experience ensuring compliance with SOC2, ISO, HIPAA or equivalent standards.
  • Soft Skills
    • Excellent stakeholder communication skills — ability to link operational KPIs with business outcomes.
    • Strong team leadership and mentoring skills, especially across distributed teams.

Nice-to-Have

  • Exposure to AIOps platforms (Moogsoft, BigPanda, OpsRamp, ServiceNow AI modules).
  • Experience with MLOps tooling (MLflow, Kubeflow, SageMaker, Azure ML) for model deployment and monitoring.
  • Prior background in platform operations at a product/SaaS company (vs pure IT Ops).
  • Experience leading automation-first initiatives (predictive alerts, self-healing infra, auto-remediation pipelines).
  • Hands-on experience with CI/CD → Ops handshakes and change-impact assessments.
  • Cloud certifications:
    • AWS Certified Solutions Architect / DevOps Engineer
    • Microsoft Certified: Azure Administrator / Solutions Architect
    • FinOps Certified Practitioner

Salary.com Estimation for Cloud Operations/Platform Manager in Alpharetta, GA
$124,849 to $154,667
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Cloud Operations/Platform Manager?

Sign up to receive alerts about other jobs on the Cloud Operations/Platform Manager career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$95,865 - $120,012
Income Estimation: 
$123,272 - $153,570
Income Estimation: 
$92,877 - $110,401
Income Estimation: 
$120,933 - $155,034
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$120,143 - $165,703
Income Estimation: 
$182,708 - $261,704
Income Estimation: 
$154,184 - $199,940
Income Estimation: 
$154,184 - $199,940
Income Estimation: 
$189,563 - $242,917
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at ALTEN

  • ALTEN Boston, MA
  • Title: Data Entry Assistant Position Type: 03 Months contract with potential extensions Location: Boston MA 02115 Hybrid Job Description Work Schedule: 35 ... more
  • 4 Days Ago

  • ALTEN San Diego, CA
  • Job Title: HR Customer Support Location: San Diego, CA (Onsite) Job Description: As a member of HR Shared Services team, you will thrive in a dynamic conta... more
  • 4 Days Ago

  • ALTEN Thousand Oaks, CA
  • This position reports into the Packaging Engineering group under Combination Product Operations (CPO) and is located in Thousand Oaks, CA. The position wil... more
  • 5 Days Ago

  • ALTEN Mountain View, CA
  • Title: Senior Site Reliability Engineer (SRE) Location: Mountain View, CA Duration: 6months CTH Responsibilities: Design, develop, and maintain automation ... more
  • 5 Days Ago


Not the job you're looking for? Here are some other Cloud Operations/Platform Manager jobs in the Alpharetta, GA area that may be a better fit.

  • Alibaba Cloud Atlanta, GA
  • We are seeking an experienced and highly motivated facility IT Operations Manager to lead our facility IT operations team. The successful candidate will be... more
  • 13 Days Ago

  • OEConnection Atlanta, GA
  • OEC provides software solutions to those who work in the automotive parts and repair industry. Our solutions make it easier for automotive industry profess... more
  • 1 Month Ago

AI Assistant is available now!

Feel free to start your new journey!