What are the responsibilities and job description for the AI Ops – Senior Architect || Onsite in Phoenix, AZ || W2 & C2C || Need Local to AZ position at Value Spectrum Technologies LLC?
Role: Senior AI Ops Architect
Location: Onsite in Phoenix, AZ
Experience: 12
Location: Onsite in Phoenix, AZ
Experience: 12
Job Description
We are seeking a highly skilled AI Ops – Senior Architect to lead the design, implementation, and optimization of AI-driven operational platforms across large-scale, mission-critical environments. The ideal candidate will possess deep expertise in machine learning–enabled operations, observability, automation frameworks, cloud engineering, and enterprise SRE/DevOps practices. This role will drive the transformation of traditional IT operations into intelligent, autonomous, self-healing systems.
The Senior Architect will collaborate with cross-functional engineering, cloud, platform, and data science teams to deliver predictive, proactive, and automated operational outcomes.
Key Responsibilities
AI-Driven Operations Architecture
Observability, Monitoring & Automation
Cloud & Platform Engineering
Data Engineering & ML Ops Integration
SRE, DevOps & Automation Frameworks
Security, Compliance & Governance
Leadership & Collaboration
Required Skills & Experience
Preferred Qualifications
We are seeking a highly skilled AI Ops – Senior Architect to lead the design, implementation, and optimization of AI-driven operational platforms across large-scale, mission-critical environments. The ideal candidate will possess deep expertise in machine learning–enabled operations, observability, automation frameworks, cloud engineering, and enterprise SRE/DevOps practices. This role will drive the transformation of traditional IT operations into intelligent, autonomous, self-healing systems.
The Senior Architect will collaborate with cross-functional engineering, cloud, platform, and data science teams to deliver predictive, proactive, and automated operational outcomes.
Key Responsibilities
AI-Driven Operations Architecture
- Lead the architecture and implementation of AI-powered operational frameworks, including predictive analytics, anomaly detection, NLP-driven automation, and auto-remediation systems.
- Define and evolve the overall AI Ops strategy, roadmap, standards, and governance.
- Implement intelligent monitoring and decision models that enhance reliability and operational efficiency.
- Architect solutions that integrate machine learning models into production operations workflows.
Observability, Monitoring & Automation
- Design end-to-end observability ecosystems (metrics, logs, traces, topology, events) integrated with AI/ML platforms.
- Build anomaly detection models using ML and time-series analysis to identify issues before failures occur.
- Drive automated incident detection, impact assessment, and classification using AI-based models.
- Implement proactive auto-healing and automated resolution workflows.
Cloud & Platform Engineering
- Architect scalable AI Ops platforms using AWS, Azure, or Google Cloud Platform cloud-native services.
- Design infrastructure and pipelines for AI-driven monitoring and operational insights.
- Integrate AI Ops capabilities with Kubernetes, service mesh, cloud-native microservices, and distributed systems.
- Optimize cost, performance, and reliability using intelligent orchestration and scaling.
Data Engineering & ML Ops Integration
- Partner with data engineering teams to build robust data pipelines for operational data ingestion.
- Work with ML Ops teams to operationalize ML models, including training, evaluation, deployment, and monitoring.
- Ensure continuous retraining and drift detection for AI Ops models.
- Define data taxonomies, quality standards, and metadata management for operational datasets.
SRE, DevOps & Automation Frameworks
- Align AI Ops with SRE principles, SLIs, SLOs, and error budgets.
- Integrate AI-driven insights into CI/CD pipelines and operational workflows.
- Develop event-driven, automated runbooks using ML and rule-based systems.
- Implement intelligent capacity planning, scaling, and resource optimization.
Security, Compliance & Governance
- Ensure AI Ops solutions meet enterprise security, compliance, and audit requirements.
- Define governance frameworks for AI model usage, transparency, and monitoring.
- Collaborate with cybersecurity teams on intelligent threat detection and risk analysis.
Leadership & Collaboration
- Provide architectural leadership and technical direction to engineering and operations teams.
- Mentor teams on AI Ops concepts, automation, and intelligent operations.
- Present architecture proposals and operational improvements to leadership stakeholders.
- Influence enterprise-wide transformation toward autonomous operations.
Required Skills & Experience
- 12 years of IT experience with 5 years in SRE/DevOps/AI Ops architecture.
- Strong expertise in:
- AI Ops platforms (Moogsoft, Dynatrace Davis AI, BigPanda, New Relic AI, Datadog AIOps)
- Observability stacks (Prometheus, Grafana, ELK, Splunk, AppDynamics)
- ML pipelines and ML Ops tooling (SageMaker, Vertex AI, MLflow, Databricks)
- Cloud architectures on AWS / Azure / Google Cloud Platform
- Event-driven systems and automation tools
- Strong programming/scripting in Python, Go, or Java for automation and ML integration.
- Experience with Kubernetes, Docker, microservices, and distributed systems.
- Deep understanding of time-series analysis, anomaly detection, NLP, and predictive analytics.
- Experience operationalizing ML models and integrating them into production systems.
Preferred Qualifications
- Certifications in cloud architecture or ML engineering.
- Background in enterprise-scale SRE, observability, or operations automation.
- Experience with LLM-based automation and AI agents for IT operations.
- Experience in highly regulated industries (Finance, Healthcare, Telecom).
Salary : $60 - $65