What are the responsibilities and job description for the Site Reliability Engineer position at Minerva Defense, Inc.?
Company Description
Minerva Defense, Inc. is a forward-thinking startup committed to advancing the arts and sciences to strengthen national security and foster prosperity. With a mission to deliver innovative solutions, the organization addresses critical challenges facing the defense sector. Rooted in a culture of cutting-edge technology and collaboration, Minerva Defense values integrity, excellence, and commitment to its mission. As a member of the team, you'll contribute to meaningful projects that have a direct impact on national security and strategic innovation.
Type: Full-Time
Location: Dayton, OH, Huntsville, AL,
Contingent on winning the ICAM Collection Development and Support TORP
Start Date: On or after June 5th, 2026
Expected Salary: $130K-$180K
Potential Sign on Bonus: $10k
Relocation Incentive
Role Description
Minerva Defense, Inc. is seeking a Site Reliability Engineer for a full-time based in Huntsville, AL, or Dayton, OH . The Site Reliability Engineer will ensure the reliability, scalability, and performance of systems and infrastructure. Responsibilities include designing, implementing, and maintaining infrastructure, troubleshooting system issues, and collaborating with cross-functional teams to create tools and processes that improve system efficiency and reduce downtime. This role offers the opportunity to influence the deployment of advanced technological solutions in a high-impact industry.
Our SREs own the reliability of systems they don't write - defining what "reliable enough" means from the user’s perspective, instrumenting and measuring against those targets, and building the tooling and runbooks that make failure recoverable. They partner with dev teams pushing operational quality upstream before code ships, and they lead the resolution in production when things go wrong. SREs are comfortable debugging distributed systems, resolving incidents, and translating findings into lasting reliability improvements. Day to day responsibilities fall into four categories: Incident Response, Toil Reduction, Reliability Evaluations, Platform Enablement
Required Qualifications
- 1-3 years of experience in Operations, Sys Admin, DevOps, or Software engineering
- Bachelor’s Degree in CS, Computer Engineering, or related technical field
- US Citizenship & must have or be able to obtain a Top Secret Clearence
- Systems thinking – understanding how systems fail together, blast radius, and more
- Observability Fundamentals – not just the 3 signals, but knowing why and how to use telemetry to optimize services and engineering quality of life
- Basic software engineering – building automation & non-trivial APIs, git workflows, effectively engaging in code reviews
- Linux/networking fundamentals
- Strong Communication, Collaboration, and Organizational Skills
Specialty Skills: (1 or more)
- Platform & Infrastructure - Kubernetes, ArgoCD/GitOps, disaster recovery, capacity planning
- Observability - OTel standards, Grafana/Perses, Tempo, Clickhouse, VictoriaMetrics
- Automation & Toil Reduction - scripting, CI/CD, runbook automation, “DevOps”
- Developer Enablement - instrumentation SDKs, SRE practice onboarding
- Data & Alerting - dashboard quality, alert design, anomaly detection
Desired Qualifications
- SRE Certifications from The DevOps Institute, AWS Solution Architect, or similar
- Hands-on experience with: Python, Go, Kubernetes, Argo CD, GitLab/GitHub, Jenkins, Docker, Locust/Gatling, Prometheus, Grafana/Perses
Salary : $130,000 - $180,000