What are the responsibilities and job description for the Infrastructure Services Director - TALON position at Tyto Athene, LLC?
Tyto Athene is searching for a high-caliber Infrastructure Services Director to spearhead the establishment and operation of our high-performance AI R&D Lab/Data Center, our Technology Acceleration Lab for Operational Needs’ TALON. This strategic role is critical for delivering high-quality, self-service infrastructure that empowers our AI R&D teams to rapidly develop and test mission-oriented solutions, including advanced defensive and mission cyber AI technologies. This leader must blend strategic planning, deep technical expertise (HPC/GPU), an unyielding commitment to CMMC compliance, and a strong focus on Site Reliability Engineering (SRE) and DevOps principles to ensure secure, efficient, and reliable service delivery. A core mandate is to manage the Service Catalog and implement processes that allow developers to "go fast" while adhering to strict security and operational guardrails.
Core Responsibilities
Lab Design & Physical Stand-up
- Lead Full-Scale Infrastructure Acquisition: Finalize labor requirements and coordinate with OEMs, VARs, and Software Vendors to build out the compute and transport core of the TALON facility.
- Operationalize the Data Center: Oversee the end-to-end delivery, racking/stacking, configuration, and integration of specialized GPU-based infrastructure.
- Create Collaborative Spaces: Maintain and optimize all lab IT infrastructure, including high-end audio-visual (AV) systems for mission briefings, WiFi, and software stacks.
- CIO Interface: Collaborate with corporate CIO function to apply lessons learned from corporate CMMC experience, to integrate service management processes and tools, to extend lab team with CIO staff for collaboration and touch labor.
- Facilities Interface: Lead coordination on critical power, cooling, UPS/generators, and physical security requirements for server rooms.
High-Performance Operations
- GPU & HPC Management: Apply DevOps principles to manage the entire lifecycle of assets, including state-of-the-art GPU technologies (NVIDIA NVLink, Spectrum-X) and workload schedulers like Slurm.
- Multi-Domain Connectivity: Design and secure LAN/WAN and OT connectivity, ensuring the lab can support NIPR and SIPR environments with a roadmap toward JWICS interconnects.
- Hybrid Cloud Acceleration: Provision and optimize AWS/Azure/GCP landing zones, enforcing guardrails while managing IL 2/4/5/6 environments.
- Cyber Network Strategy: Implement secure network segments specifically tailored for defensive and mission-oriented AI cyber projects.
Service & Compliance Management
- Self-Service Catalog: Own the service catalog and SLAs, implementing automated processes that empower developers to provision resources instantly.
- Secure-by-Design: Implement zero-trust controls, identity management, and SIEM logging to ensure the lab remains audit-ready (NIST/ISO) and aligned with CMMC standards without sacrificing performance.
Technical Qualifications
- 10 Years of Experience: Proven track record in core infrastructure operations including Linux, virtualization, and disaster recovery—standardized via Infrastructure-as-Code (IaC).
- AI/HPC Mastery: Direct experience in HPC cluster administration and the integration of GPU software stacks (driver management, workload scheduling).
- Fabric & Storage: Expert knowledge of high-performance parallel filesystems and high-speed fabrics like Infiniband.
- Modern Toolchain: Proficiency with Ansible, GIT, Slurm, Zabbix, and container platforms (Kubernetes, Docker, Apptainer).
- Federal Compliance: Experience implementing security controls within a government contracting environment; ability to align infrastructure to CMMC Level 3 standards.
CMMC:
- Experience in devising a CMMC strategy and the successful attainment of a CMMC Level 3 accreditation for an AI powered R&D lab serving a government contractor