What are the responsibilities and job description for the Principal Engineer, DevOps & Infrastructure position at ZeroEyes?
We’re looking for a low-ego, high-ownership Principal Engineer to provide technical oversight, mentoring, and hands-on design for our DevOps and Infrastructure-as-Code (IaC) stack. You’ll be the bar-setter for reliability, security, and velocity across our cloud and orchestration platform. AWS and FedRAMP experience is highly desired—you’ll help us build, document, and run systems that meet rigorous compliance requirements while staying developer-friendly and cost-efficient.
We value people who have strong, well-informed opinions and express them diplomatically, who care more about truth than winning arguments, who mentor generously, and who take personal responsibility for the organization’s success.
- Own the platform architecture: Define target state for cloud, network, identity, and runtime orchestration across AWS.
- Lead Infrastructure as Code: Establish standards and reusable modules (Terraform/Pulumi), policy-as-code (Kyvernol), GitOps workflows (FluxCD/Kustomize), and immutable images (Gitlab/Dagger).
- Drive FedRAMP/NIST alignment: Map controls (NIST 800-53), lead technical portions of the SSP, implement CIS/STIG hardening, FIPS-validated crypto usage, logging/monitoring requirements, and support ATO and continuous monitoring (ConMon) with auditors/3PAO.
- Elevate reliability: Define SLOs/error budgets, incident response/runbooks, postmortems without blame, chaos, and DR testing, multi-AZ/region strategies, and cost-aware resilience (RTO/RPO).
- Hands-on enablement: Pair with teams to deliver CI/CD (GitLab Pipelines), container platforms (Kubernetes/EKS/GKE, ECS, Fargate), and service meshes/ingress.
- Security by default: Identity & access (IAM/Okta, SSO/SAML/OIDC), secrets (AWS SM/KMS), supply-chain security (SBOM, Sigstore/Cosign, SLSA/SSDF), network segmentation/zero-trust.
- Observability: Standardize metrics/logs/traces (Prometheus/Grafana/OpenTelemetry, ELK/Datadog), golden signals, actionable alerts, and capacity planning/FinOps.
- Mentor & multiply: Coach SRE/DevOps/Platform engineers, run design reviews/ADRs, and establish pragmatic guardrails that speed teams up.
- Vendor & cost management: Evaluate and right-size infra/services; build dashboards and budgets that the business can trust.
- Documentation & audits: Keep docs current (runbooks, diagrams, control evidence); make auditors happy without slowing engineers.
- 10 years building/running production infrastructure; 5 years leading DevOps/SRE or platform teams; prior Principal/Staff scope.
- FedRAMP Moderate experience and working with a 3PAO.
- Deep IaC expertise (Terraform or Pulumi), GitOps, and modern CI/CD; wide breadth of knowledge, including container orchestration (ECS/EKS) and container security.
- Multi-cloud proficiency (AWS strongly preferred).
- Security engineering literacy: NIST/CIS/STIG, FIPS 140-2/3 crypto usage, key management (KMS/HSM), least-privilege IAM, and policy-as-code.
- Observability at scale (metrics/logs/traces), performance tuning, and cost governance/FinOps practices.
- Strong coding skills in at least one of C /Python/Golang/.NET, plus Bash; able to build tooling and not just wire it together.
- Excellent written design docs and clear, candid communication; proven track record mentoring senior engineers.
- Nice-to-have: data pipeline or real-time video/ML workloads; service mesh (Istio), incident command experience.
- No jerks
- Be authentic
- Be effective
- Attention to detail
- All in, all the time
- Must be authorized to work in the U.S. Ability to obtain and maintain a Public Trust or other clearance may be required.