What are the responsibilities and job description for the Cloud Operations/Platform Manager position at ALTEN?

Job Description - Manager, Cloud/Platform Operations

Location: Atlanta / Roswell, GA (Onsite with Offshore Team Management)

Role Summary

Rheem is seeking a Manager, Cloud Operations to lead, transform, and scale its digital operations landscape across CloudOps, SRE, NOC, Observability, AIOps, and MLOps. This individual will serve as the single point of accountability for operational stability and innovation, managing offshore teams while working closely with Rheem's U.S. digital leadership.

This role is not a steady-state manager position. The successful candidate will:

Identify operational gaps.
Suggest and implement best practices and tools.
Introduce automation and innovation strategies.
Guide daily deliverables for offshore teams.
Demonstrate tangible business impact each quarter (improved uptime, reduced MTTR, cost savings, predictive alerting, etc.).

The Manager will report to the Director of Digital Operations and act as Rheem's Cloud Operations Leader in practice.

Key Responsibilities

Operations Strategy & Governance

Define the vision, strategy, and roadmap for CloudOps, Reliability, and Operational Excellence.
Establish KPIs and OKRs aligned with Rheem's business goals (availability, MTTR, cloud cost per device, customer churn reduction).
Deliver quarterly impact reports to business leadership showcasing operational improvements and ROI.

Cloud Operations & FinOps

Own multi-region cloud operations across AWS and Azure platforms.
Drive cost transparency and optimization via FinOps practices and dashboards.
Build capacity and resiliency models for predictable operations.
Conduct resiliency drills and game days to ensure high availability and compliance.

Site Reliability Engineering (SRE)

Establish SLIs, SLOs, and error budgets to measure reliability.
Build incident management playbooks and drive blameless postmortems.
Proactively improve reliability through automation, self-healing, and continuous testing.

Network Operations Center (NOC) Modernization

Transform NOC from alert-driven to predictive, AIOps-enabled operations.
Consolidate monitoring tools and reduce alert fatigue with intelligent correlation.
Ensure 24x7 support coverage through offshore team alignment and escalation management.

Observability & Telemetry

Build a unified observability stack (logs, metrics, traces, RUM) leveraging OpenTelemetry.
Enable business-oriented dashboards (device uptime, customer adoption, churn trends).
Ensure end-to-end visibility from connected devices → cloud microservices → customer-facing apps.

AIOps & MLOps [optional]

Deploy AIOps solutions for anomaly detection, predictive alerts, event correlation, and automated remediation.
Operationalize ML models: rollout, monitoring, drift detection, rollback strategies.
Showcase measurable value, e.g., warranty claim reduction, improved customer experience metrics.

Process Innovation & Automation

Audit current toolchain and processes; identify redundancies, gaps, and opportunities for automation.
Align with DevOps/SecOps to streamline release-to-operations handshakes.
Drive Infrastructure-as-Code for operations (Terraform, Ansible, GitOps).

Team Leadership & Offshore Management

Manage and mentor a distributed team (offshore onsite), setting clear goals and accountability.
Define roles, responsibilities, and shift structures for 24x7 global coverage.
Build a culture of continuous improvement and operational excellence.

Compliance, Security & Risk

Ensure Rheem operations align with compliance standards (SOC2, ISO, HIPAA where applicable).
Own business continuity planning and disaster recovery testing.
Proactively identify operational risks and mitigate before they impact business.

Business Alignment & Change Leadership

Act as the voice of operations at business leadership tables.
Translate technical improvements into business outcomes (lower churn, improved uptime, faster installs, fewer complaints).
Champion a quarterly innovation agenda to showcase improvements in uptime, cost, and reliability.

Qualifications

Must-Have

Experience & Leadership

10 years of experience in Cloud Operations, Site Reliability Engineering, or Digital Operations.
Proven track record of owning operational outcomes (uptime, MTTR, cost optimization, observability).
Experience managing offshore/global delivery teams with 24x7 coverage.
Strong leadership presence — able to act as a change agent, operate autonomously, and deliver measurable outcomes without day-to-day direction.

Cloud & Technical Expertise

Hands-on experience with AWS and/or Azure (multi-account, multi-region operations).
Solid expertise with observability & monitoring tools (Datadog, Dynatrace, Splunk, Grafana, Prometheus, ELK/EFK).
Familiarity with Infrastructure-as-Code (Terraform, Ansible, GitOps).
Strong understanding of SRE principles (SLIs, SLOs, error budgets, incident management frameworks).

Process & Governance

Demonstrated ability to design and implement operations frameworks (Ops playbooks, NOC modernization, incident command systems).
Knowledge of FinOps practices (cloud cost visibility, optimization, showback/chargeback).
Experience ensuring compliance with SOC2, ISO, HIPAA or equivalent standards.

Soft Skills

Excellent stakeholder communication skills — ability to link operational KPIs with business outcomes.
Strong team leadership and mentoring skills, especially across distributed teams.

Nice-to-Have

Exposure to AIOps platforms (Moogsoft, BigPanda, OpsRamp, ServiceNow AI modules).
Experience with MLOps tooling (MLflow, Kubeflow, SageMaker, Azure ML) for model deployment and monitoring.
Prior background in platform operations at a product/SaaS company (vs pure IT Ops).
Experience leading automation-first initiatives (predictive alerts, self-healing infra, auto-remediation pipelines).
Hands-on experience with CI/CD → Ops handshakes and change-impact assessments.
Cloud certifications:

AWS Certified Solutions Architect / DevOps Engineer
Microsoft Certified: Azure Administrator / Solutions Architect
FinOps Certified Practitioner

Apply for this job

Receive alerts for other Cloud Operations/Platform Manager job openings

Cloud Operations/Platform Manager

What are the responsibilities and job description for the Cloud Operations/Platform Manager position at ALTEN?

What is the career path for a Cloud Operations/Platform Manager?

Job openings at ALTEN

Not the job you're looking for? Here are some other Cloud Operations/Platform Manager jobs in the Alpharetta, GA area that may be a better fit.

We don't have any other Cloud Operations/Platform Manager jobs in the Alpharetta, GA area right now.

AI Assistant is available now!