What are the responsibilities and job description for the Senior Platform Engineer – Devops job position at Bluebird International?
Senior Platform Engineer – Devops job
Date published: December 8, 2025
ID: 13848
Location: Budapest
Job type: DevOps Engineer
Our mission is to build groundbreaking AI products from scratch. We´re looking for a Senior Platform Engineer to architect and build the high-availability, scalable platform that will power our entire AI operation.
Our platform will be built on a multi-region Azure foundation (AKS + Cosmos DB + Event Hubs). We are just starting to build our Platform team, and you will be a founding member. You won´t just be operating a platform; you will be building it from the ground up: from the Terraform code for our AKS clusters to the CI/CD pipelines for our models. This is a hands-on role focused on engineering & automation. We work according to SRE best practices with the goal of creating a platform that will achieve 99.9%+ availability.
What You´ll Do:
· Build the Platform from Scratch:
o Code new AKS clusters, networking (VNet), and IAM guardrails using Terraform and Helm charts.
o Create ´golden´ Docker images, GitOps pipelines (ArgoCD/Flux), automatic node provisioning, and scaling policies for both CPU and GPU workloads.
o Design and implement the core MLOps infrastructure, including artifact repositories, model registries, and feature stores.
· Automate for Reliability:
o Implement and fine-tune our observability stack: Azure Monitor metrics, Prometheus, Grafana dashboards.
o Build automated recovery mechanisms and chaos engineering tests to proactively find and fix weaknesses in the system.
· Champion Platform Best Practices:
o Work with development teams to ensure they are building reliable, observable, and secure applications from day one.
o Create runbooks and documentation to prepare for future incident management.
Key Responsibilities:
· IaC Development and Maintenance: Manage our infrastructure state with Terraform Cloud or Atlantis.
· Kubernetes Operations: Handle version upgrades, manage node pools (including GPU nodes), and define network policies.
· Data Environment Reliability: Ensure the reliability of our data stores (e.g., Cosmos DB geo-replication, Event Hubs consumer group management).
· Security Hardening: Implement security best practices, including CVE scanning for Docker images and regular patching of node AMIs.
· Observability Pipeline: Manage log processing, alerting rules, and capacity forecasting to stay ahead of problems.
· Support AI Engineers: Provide a self-service platform and tooling that enables AI Engineers to train, deploy, and monitor their models with minimal friction.
What You´ll Bring:
· 5+ years of experience in a DevOps, SRE, or Platform Engineering role.
· Deep, hands-on experience with at least one major cloud provider (Azure is a strong plus).
· Proven experience with containerization (Docker) and orchestration (Kubernetes) in a production environment.
· Expertise in Infrastructure as Code (Terraform is a must).
· Strong programming skills in a scripting language (Python is a strong plus).
· Experience building and maintaining production-grade CI/CD systems.
· A proactive mindset focused on preventing incidents rather than just reacting to them.
What We Offer:
· A Green-field Opportunity: You will be building a state-of-the-art AI platform from the ground up, using the best tools for the job.
· A Modern Toolkit: Work with GitHub, Kubernetes, Managed Grafana, Terraform, and the latest Azure AI services.
· Real Impact: Your work is the foundation upon which our entire AI strategy is built. You are a critical enabler for the entire team.
· Focus on Engineering, Not Firefighting: In the initial phase, your role is 100% focused on building and automating, not on reactive, on-call firefighting.
· A Laid-back, Senior Team: We have one daily stand-up, then we focus on deep work.
· Competitive Salary.
· HO-friendly with a cool HQ in Budapest.