What are the responsibilities and job description for the Senior DevOps Engineer position at Torch.AI?
Join a Mission That Matters
Torch.AI builds the government’s AI-native control layer—a foundational data and intelligence capability designed to give the United States enduring ownership over how knowledge is created, managed, and applied in mission-critical environments. Our vision is simple and profound: ensure the U.S.—not a vendor—operates its own AI infrastructure to make decisive, real-time operational decisions.
We work at the center of national security, helping agencies transform data into actionable knowledge across domains including information advantage, ISR, joint effects, force protection, and mission support. If you’re driven to build technology that strengthens U.S. defense readiness and protects national interests, Torch.AI offers an opportunity to make meaningful impact at national scale.
What It’s Like To Work Here
Torch.AI operates at the intersection of advanced AI, secure systems, and real-world operations. We solve complex, high-stakes problems by building scalable, modular technology that reduces cognitive burden, improves mission performance, and delivers timely, reliable, predictive intelligence.
Here, you’ll collaborate with a diverse team of engineers, data experts, veterans, and mission practitioners. You’ll have the autonomy to own meaningful work and the support of teammates who share context openly, challenge assumptions, and think creatively. Every day brings something different—early prototypes, production deployments, mission support, or scaling next-generation capabilities.
We’re a fast-paced, entrepreneurial environment where curiosity, adaptability, and a deep commitment to national security drive everything we do.
Responsibilities
Some roles require an active Secret, Top Secret, or Top Secret/SCI clearance. If you do not currently hold an active clearance but believe you may be eligible, we encourage you to apply as sponsorship is also available.
Work Location
We hire for roles at our headquarters in Leawood, KS, and for hybrid/remote positions in the Arlington, VA, Washington, DC, and Maryland (DMV) region. Some roles may require limited travel (<10%) to customer sites.
Incentives
Torch.AI provides benefits that exceed regional and national benchmarks across multiple categories:
401(k)
JOB CODE: 1000069
Torch.AI builds the government’s AI-native control layer—a foundational data and intelligence capability designed to give the United States enduring ownership over how knowledge is created, managed, and applied in mission-critical environments. Our vision is simple and profound: ensure the U.S.—not a vendor—operates its own AI infrastructure to make decisive, real-time operational decisions.
We work at the center of national security, helping agencies transform data into actionable knowledge across domains including information advantage, ISR, joint effects, force protection, and mission support. If you’re driven to build technology that strengthens U.S. defense readiness and protects national interests, Torch.AI offers an opportunity to make meaningful impact at national scale.
What It’s Like To Work Here
Torch.AI operates at the intersection of advanced AI, secure systems, and real-world operations. We solve complex, high-stakes problems by building scalable, modular technology that reduces cognitive burden, improves mission performance, and delivers timely, reliable, predictive intelligence.
Here, you’ll collaborate with a diverse team of engineers, data experts, veterans, and mission practitioners. You’ll have the autonomy to own meaningful work and the support of teammates who share context openly, challenge assumptions, and think creatively. Every day brings something different—early prototypes, production deployments, mission support, or scaling next-generation capabilities.
We’re a fast-paced, entrepreneurial environment where curiosity, adaptability, and a deep commitment to national security drive everything we do.
Responsibilities
- Architect, implement, and maintain scalable DevOps infrastructure spanning development, staging, production, and multi-security-domain (low → mid → high-side) environments.
- Lead design and delivery of infrastructure-as-code (Terraform, CloudFormation, Ansible) with deep ownership of provisioning, configuration, and environment automation.
- Own container orchestration strategy (Kubernetes, ECS, Helm), ensuring reliable deployment, scaling, and automated recovery across mission workloads.
- Implement secure networking, including VPC design, bastion strategies, certificate management, NGINX/Envoy reverse proxies, and system hardening in accordance with STIG and enclave constraints.
- Lead observability architecture (Prometheus, Grafana, CloudWatch, ELK) for metrics, alerting, and distributed logging.
- Direct vulnerability scanning, patch management, and automated security compliance workflows.
- Collaborate intensively with AI/ML, data engineering, and software teams to ensure platform reliability and mission-aligned deployment patterns.
- Partner with security, compliance, and infrastructure stakeholders on deployments into classified and isolated cloud environments (SC2S, C2S, JWICS).
- Guide the evolution of CI/CD pipelines, release processes, and environment promotion strategies for both prototype and production-grade systems.
- Mentor DevOps and platform engineers, conduct design reviews, and set DevOps best practices across the organization.
- Produce architectural documentation, runbooks, and scaling/security strategies for platform reliability and mission uptime.
- B.S. or M.S. in Computer Science, Engineering, or related field.
- 8–12 years of DevOps, infrastructure engineering, or SRE experience.
- Direct experience deploying and managing systems in secure or classified cloud domains.
- Expert-level proficiency with Terraform or an equivalent IaC framework, including module design and environment automation.
- Production-level experience operating Kubernetes/ECS clusters and deploying microservices at scale.
- Strong expertise in Linux systems, networking, security hardening, IAM, STIG/SCAP processes, and enclave deployment constraints.
- Deep familiarity with observability stacks (Prometheus, Grafana, CloudWatch, or ELK).
- Strong scripting skills (Bash, Python) and experience with automation frameworks.
- Experience architecting CI/CD pipelines for complex and multi-environment deployments.
- Strong leadership skills: mentoring, architectural decision-making, and cross-team alignment.
- Nice-to-have:
- Experience supporting AI/ML infrastructure or MLOps tooling
- Experience with distributed systems, Kafka, or data streaming architectures
- Exposure to time-series systems, HPC workflows, or large-scale mission computing environments
Some roles require an active Secret, Top Secret, or Top Secret/SCI clearance. If you do not currently hold an active clearance but believe you may be eligible, we encourage you to apply as sponsorship is also available.
Work Location
We hire for roles at our headquarters in Leawood, KS, and for hybrid/remote positions in the Arlington, VA, Washington, DC, and Maryland (DMV) region. Some roles may require limited travel (<10%) to customer sites.
Incentives
- Equity participation for all employees within the first 12 months
- Competitive salary quarterly performance bonus
- Unlimited PTO 11 paid company holidays
- High-growth, mission-driven environment with significant professional development opportunities
- Weekly in-office catering at HQ
Torch.AI provides benefits that exceed regional and national benchmarks across multiple categories:
401(k)
- Torch.AI provides participating employees with the option to enroll in a company-sponsored plan. Currently, Torch.AI does not provide matching.
- Three plan options: PPO, HSA, TRICARE Supplement
- HSA contributions significantly above regional norms
- Rare TRICARE Supplement offering (top ~18% of employers)
- HSA, FSA, and Dependent Care FSA with high employer contributions and rollover flexibility
- Plans with annual maximums up to 2.6× the national average
- VSP Choice network with above-market frame allowances
- Employer-paid life insurance equal to 1× salary (substantial above-market coverage)
- Top 10% regional coverage for STD and LTD (both employer paid)
- Accident, Critical Illness, Hospital Indemnity
- Commuter benefits up to $300/month (tax-free)
JOB CODE: 1000069