What are the responsibilities and job description for the CloudOps Lead Engineer position at Jobs via Dice?
Dice is the leading career destination for tech experts at every stage of their careers. Our client, Sharpedge Solutions, is seeking the following. Apply via Dice today!
We are looking for a motivated and hands-on Cloud Ops Engineer to join our team and play a critical role in deploying, operating, and maintaining our cloud-based products. This role is essential to ensuring smooth deployments, high system reliability, and strong operational performance across our environments.
The ideal candidate has a strong want-to-learn mindset, is eager to deeply understand our ecosystem and architecture, and demonstrates ownership, leadership, and the ability to drive initiatives independently.
Key Responsibilities
Required Skills & Qualifications
We are looking for a motivated and hands-on Cloud Ops Engineer to join our team and play a critical role in deploying, operating, and maintaining our cloud-based products. This role is essential to ensuring smooth deployments, high system reliability, and strong operational performance across our environments.
The ideal candidate has a strong want-to-learn mindset, is eager to deeply understand our ecosystem and architecture, and demonstrates ownership, leadership, and the ability to drive initiatives independently.
Key Responsibilities
- Cloud Operations & Deployments
- Support and manage cloud deployments to ensure reliable, repeatable, and high-quality releases.
- Own day-to-day cloud operational health, including uptime, performance, and stability.
- Work closely with engineering teams to support product deployments and operational readiness.
- Automation & Scripting
- Design, develop, and maintain automation scripts to improve deployment quality, efficiency, and consistency.
- Continuously enhance CI/CD and operational workflows through scripting and tooling.
- Reduce manual effort and operational risk through automation-first approaches.
- Monitoring & System Health
- Implement and maintain monitoring, alerting, and logging to ensure system health and performance.
- Proactively identify issues, perform root cause analysis, and drive permanent fixes.
- Ensure systems are scalable, resilient, and performant.
- Architecture & Ecosystem Understanding
- Develop a deep understanding of the platform architecture, cloud ecosystem, and dependencies.
- Contribute to operational best practices, standards, and continuous improvement initiatives.
- Act as a self-starter who can independently identify gaps and propose solutions.
- LLM & AI Enablement
- Apply Large Language Models (LLMs) in cloud operations use cases such as automation, observability, diagnostics, or operational intelligence.
- Stay current with advancements in LLMs and AI-driven tooling and apply them pragmatically within the Cloud Ops domain.
- Collaborate with engineering teams to integrate LLM-based capabilities into operational workflows.
Required Skills & Qualifications
- Hands-on experience with cloud platforms (Azure.
- Strong scripting skills (e.g., Python, Bash, PowerShell, or similar).
- Experience with deployment pipelines, automation, and monitoring tools.
- Solid understanding of cloud infrastructure, networking, and application operations.
- Practical experience working with Large Language Models (LLMs).
- Familiarity with applying LLMs to engineering or operational workflows is required.
- Strong desire to learn and deeply understand complex systems.
- Self-starter with the ability to take ownership and drive initiatives independently.
- Demonstrates leadership, accountability, and problem-solving mindset.