What are the responsibilities and job description for the SRE - Backend (Java) - Austin, TX & Sunnyvale, CA position at Jobs via Dice?
Dice is the leading career destination for tech experts at every stage of their careers. Our client, XFORIA Inc, is seeking the following. Apply via Dice today!
Job Title: SRE - Backend (Java)
Location: Sunnyvale, CA / Austin, TX
Duration: Fulltime
Job Description:
Key Responsibilities
Job Title: SRE - Backend (Java)
Location: Sunnyvale, CA / Austin, TX
Duration: Fulltime
Job Description:
Key Responsibilities
- Architect and drive large-scale migrations of business-critical services to AWS and Kubernetes-based platforms
- Define and implement GitOps-first deployment strategies using ArgoCD, with Spinnaker for advanced delivery workflows
- Design, build, and operate production-grade AWS EKS platforms at scale
- Establish best practices for CI/CD, deployment automation, and release strategies (blue/green, canary, progressive delivery)
- Design and maintain reusable Helm charts and standardized deployment patterns
- Develop and maintain Python-based tooling and automation for deployment, operations, and reliability
- Provide deep Linux systems expertise, including performance tuning, debugging, and incident mitigation
- Own and support production systems, including on-call participation, incident response, and root cause analysis
- Partner with SRE and Security teams to embed reliability, scalability, and security into platform design
- Drive architectural reviews, author design documents, and influence long-term platform and migration roadmaps
- Mentor engineers and raise the bar for DevOps and platform engineering practices
- 10 years of experience as a Cloud / DevOps / Platform Engineer supporting production systems
- Proven experience leading AWS migrations for large, high-traffic, business-critical platforms
- Strong hands-on expertise with:
- Linux systems (performance tuning, networking, troubleshooting)
- Python for automation, tooling, and operational workflows
- AWS (EKS, VPC, IAM, EC2, ALB/NLB, CloudWatch, S3, RDS)
- Kubernetes (EKS) in production environments
- ArgoCD and GitOps deployment models
- Spinnaker for continuous delivery
- Helm for application packaging and release management
- Experience operating and supporting production environments with on-call responsibility
- Experience with Infrastructure as Code (Terraform and/or CloudFormation)
- Strong understanding of distributed systems, networking, and cloud security
- Ability to lead through influence and collaborate across engineering disciplines
- Familiarity with Akamai CDN, caching strategies, and edge delivery patterns
- Experience with Redis (caching, replication, high availability)
- Experience with Kafka or other distributed messaging systems
- Experience operating platforms at scale (hundreds of services, multi-team environments)
- Experience with observability platforms (Prometheus, Grafana, OpenTelemetry, Splunk, Datadog)
- Familiarity with SRE practices including SLIs, SLOs, error budgets, and incident response
- Experience with service mesh technologies (Istio, Linkerd)
- Strong written and verbal communication skills, including technical design documentation and executive-level discussions