What are the responsibilities and job description for the Senior Infrastructure/Platform Engineer position at Tech Observer?
Job Overview
We are looking for an Infrastructure Development Engineer to design, operate, and scale foundational datacenter services that power bare-metal, virtualization, and cloud-adjacent platforms. This role owns the automation to boot and manage critical services such as corporate IPAM/DDI, CMDB, and datacenter bootstrapping systems.
You will work across hardware, networking, and platform teams to ensure infrastructure is discoverable, automated, reliable, and ready for self-service consumption.
Key Responsibilities:
- Build automation and tools in Python
- Develop Python-based tools and services for provisioning, configuration, monitoring, and self-service workflows
- Automate repetitive operational tasks (imaging, deployments, health checks, remediation) and reduce manual intervention
- Integrate with internal and external APIs to orchestrate infrastructure workflows (compute, storage, network, cloud)
- Developers with experience in other programming languages, such as C or Java, will also be considered.
Software Defined Network Services: IPAM, DDI & CMDB
- Own and operate the corporate IP Address Management (IPAM) and DDI (DNS, DHCP, IPAM) platforms.
- Design scalable IP allocation, DNS, and DHCP strategies across multiple datacenters and environments.
- Integrate IPAM/DDI systems with provisioning, bootstrapping, and CMDB workflows.
- Act as a steward of the CMDB, ensuring accuracy, consistency, and automation-driven updates.
- Define and enforce standards for asset discovery, lifecycle state, and dependency mapping.
- Monitoring, observability, and reliability
- Implement and improve monitoring, alerting, and dashboards for infrastructure health (e.g., Prometheus, Grafana, ELK/Nagios or similar)
- Define and track key metrics (availability, latency, capacity, error rates), and drive improvements based on data
- Participate in incident response, perform root cause analysis, and implement long-term fixes and runbooks
Required Skills & Experience:
- Experience with bare-metal provisioning and hypervisor deployment.
- Hands-on experience with OpenStack, VMware, KubeVirt, or similar virtualization platforms.
- Deep understanding of IPAM, DNS, and DHCP at enterprise scale.
- Experience operating or integrating CMDB systems as a source of truth.
- Solid knowledge of datacenter networking concepts, including Fibre Channel. Proficiency with Linux systems and troubleshooting at hardware and OS layers.
Automation & Systems Thinking
- Experience building infrastructure automation and onboarding pipelines.
- Familiarity with API-driven integrations and workflow orchestration.
- Ability to reason about infrastructure as a platform, not just individual systems.
Collaboration & Ownership
- Comfortable working cross-functionally with hardware, network, storage, and SRE teams.
- Strong operational mindset with a focus on reliability, correctness, and supportability.
- Ability to drive ambiguous problems to clear, automated solutions.
Nice to Have:
- Experience with large-scale internal platforms or “infrastructure as a product”.
- Background in SRE or reliability engineering.
- Exposure to self-service infrastructure platforms and developer enablement.
- Experience operating in multi-datacenter or hybrid environments.
- Server Bootstrapping & Provisioning automation
- Familiar with datacenter bootstrapping services, including PXE, imaging, and initial OS/hypervisor provisioning.
- Ensure seamless handoff from hardware arrival to production-ready infrastructure.
- Improve time-to-serve metrics for new racks, clusters, and testbeds.
Salary : $70 - $772