What are the responsibilities and job description for the Sire Reliability Engineer (DevOps, cloud infra, data) position at Intelliswift - An LTTS Company?
Best job title : Sire Reliability Engineer but could be DevOps, cloud infra, data or Platform engineer too.
Location: Cupertino, CA
Skills Matrix
- AWS & EKS (Amazon Elastic Kubernetes Service)
- Kubernetes / kubectl / Helm
- CI/CD Pipeline Automation (Rio CI/CD)
- ELK Stack (Elasticsearch, Logstash, Kibana)
- Cloud Migration (Docker to AWS)
- Data Pipeline Engineering (S3 & Postgres)
- Infrastructure as Code (IAM & Secret Management)
Job Description
- AWS & Kubernetes Administration — Manage and maintain AWS infrastructure with hands-on expertise in EKS; configure kubectl access, deploy and operate containerized services including Postgres, bldrapp, bldrapp-summary, and ingress controllers across the cluster.
- CI/CD Pipeline Ownership — Own and evolve the automated build and deployment pipeline using Rio CI/CD, ensuring code changes are reliably built, tested, and promoted through environments with minimal manual intervention.
- Data Pipeline Engineering — Design, deploy, and maintain ingest workflows using Kubernetes CronJobs to reliably move data from Conductor S3 into Postgres, with full ownership of credentials, IAM access, and Kubernetes secret management.
- Cloud Migration & Architecture Fluency — Demonstrate a strong understanding of legacy infrastructure patterns (NAS Docker) and modern cloud-native equivalents; able to reason about migration phases, deployment pipelines, and architectural trade-offs when evolving the system.
- Documentation & Operational Excellence — Proactively contribute to and maintain runbooks, troubleshooting guides, and onboarding documentation; capable of diagnosing and resolving known failure modes independently using documented playbooks and root-cause analysis.
- Observability & Log Management (ELK Stack) — Deploy and manage the ELK stack (Elasticsearch, Logstash, Kibana) to centralize logging, build dashboards, and enable real-time monitoring and alerting across all services and infrastructure components.
- Quality Assurance & Testing — Collaborate with QA to define, implement, and maintain automated test strategies across unit, integration, and end-to-end layers; ensure deployment pipelines enforce quality gates before changes reach production.