What are the responsibilities and job description for the SRE position at Jobs via Dice?
Dice is the leading career destination for tech experts at every stage of their careers. Our client, Shimento, Inc., is seeking the following. Apply via Dice today!
SRE
Introduction:
We are looking for a highly skilled Site Reliability Engineer (SRE) to join a fast-paced engineering team focused on building and scaling next-generation infrastructure platforms. This role offers the opportunity to work across Kubernetes, cloud infrastructure, AI-enabled operations, and modern DevOps ecosystems.
Responsibilities:
Required Qualifications:
SRE
Introduction:
We are looking for a highly skilled Site Reliability Engineer (SRE) to join a fast-paced engineering team focused on building and scaling next-generation infrastructure platforms. This role offers the opportunity to work across Kubernetes, cloud infrastructure, AI-enabled operations, and modern DevOps ecosystems.
Responsibilities:
- Design, build, and manage large-scale multi-cluster Kubernetes platforms across cloud and on-prem environments.
- Develop and maintain controllers, CRDs, ingress, DNS, and TLS automation to support scalable infrastructure provisioning.
- Build and secure platform microservices including CI/CD pipelines, SSO, RBAC, secret management, and monitoring workflows.
- Integrate AI-driven tooling and automation into operational workflows to improve platform scalability and efficiency.
- Own end-to-end production release management including Helm deployments, multi-architecture container builds, staged rollouts, and rollback strategies.
- Implement observability, audit logging, analytics, and automation to support high-scale production operations.
Required Qualifications:
- 6 years of hands-on DevOps/SRE experience supporting production Kubernetes environments.
- Strong expertise with Kubernetes operators, CRDs, ingress controllers, and cluster networking.
- Experience integrating AI tools and automation into engineering workflows.
- Strong programming skills in Python or Go, along with working knowledge of TypeScript/React.
- Solid experience with AWS or similar cloud platforms, OIDC/SAML authentication, and secret management solutions.
- Strong understanding of relational databases, caching technologies, and asynchronous communication patterns such as WebSockets, SSH tunneling, and message queues.
- Proven experience building internal developer platforms and enterprise CI/CD pipelines.
- Bachelor’s or Master’s degree in Computer Science or related field.
- Experience with Linux administration, multi-tenant platforms, virtual machines, and Kubernetes orchestration.
- Deep understanding of agentic AI workflows, developer tooling, CLI frameworks, and MCP ecosystems.
- Experience building AI-assisted operational tooling including anomaly detection, automated runbooks, and LLM-powered operations workflows.
- Hands-on expertise with Jenkins, GitLab CI, ArgoCD, Flux, Prometheus, Grafana, VictoriaMetrics, Datadog, Splunk, or Kibana.
- Strong documentation and troubleshooting mindset with a passion for creating scalable operational runbooks and knowledge bases.