What are the responsibilities and job description for the Kubernetes/Integration Platform Engineer position at Expedite Talent Solutions?
Overview
This role is critical to the implementation, development, and maintenance of Kubernetes and NATS platforms. These enterprise platforms power compute and data integration across Edge sites (on-premises) and Cloud (AWS). The platforms support use cases spanning Manufacturing, Supply Chain, and Commercial activities, with a focus on delivering high-availability, reliable platforms that enable mission-critical applications, including AI workloads.
Platform Team:
The platform team is a 24/7 capability responsible for maintaining and enhancing compute and integration capabilities, especially using Kubernetes and NATS (streaming technologies) across Edge sites and Cloud environments. This team collaborates closely with application teams and stakeholders across the organization.
Responsibilities
This role is critical to the implementation, development, and maintenance of Kubernetes and NATS platforms. These enterprise platforms power compute and data integration across Edge sites (on-premises) and Cloud (AWS). The platforms support use cases spanning Manufacturing, Supply Chain, and Commercial activities, with a focus on delivering high-availability, reliable platforms that enable mission-critical applications, including AI workloads.
Platform Team:
The platform team is a 24/7 capability responsible for maintaining and enhancing compute and integration capabilities, especially using Kubernetes and NATS (streaming technologies) across Edge sites and Cloud environments. This team collaborates closely with application teams and stakeholders across the organization.
Responsibilities
- Design, build, and operate Kubernetes clusters and container platforms at scale supporting multiple environments (dev, staging, production) across Edge and Cloud (AWS)
- Implement and maintain CI/CD pipelines for automated deployment and infrastructure provisioning
- Develop infrastructure as code using tools such as Terraform, Helm, Ansible, or similar technologies
- Monitor platform health, troubleshoot infrastructure issues, and drive continuous improvement
- Collaborate with development teams to containerize applications and optimize resource utilization
- Support critical platforms including consulting, debugging, break/fix execution, and participation in on-call rotation
- Enable self-service capabilities for application teams on Kubernetes and NATS platforms
- Proactively seek and share knowledge, build strong networks, and drive continuous improvement through learning, experimentation, and constructive challenges
- Be willing and able to support an on-call rotation for nights and weekends to support and respond to critical outages and incidents.
- Strong expertise with Kubernetes, including cluster setup, management, networking, storage, RBAC, and troubleshooting
- Demonstrated experience creating and maintaining infrastructure using Infrastructure-as-Code tools (Terraform, Helm, CloudFormation, Ansible)
- Experience managing a GitHub organization with DevOps and Infrastructure-as-Code practices in GitHub Actions, Workflows, etc.
- Experience with container orchestration and GitOps tools like ArgoCD and Rancher
- Experience administering cloud platforms (AWS, Azure), with emphasis on enterprise governance, support, automation, and development
- Proficiency in Python/GoLang for developing automation scripts, tools, and infrastructure management solutions
- Experience with data streaming technologies such as Kafka, NATS, MQTT, or similar platforms
- Experience with monitoring and logging tools such as Prometheus, Grafana, Splunk
- Experience with edge computing architectures and hybrid cloud deployments
- Certifications such as CKA (Certified Kubernetes Administrator), CKAD (Certified Kubernetes Application Developer), or cloud provider certifications (AWS Solutions Architect, etc.)