What are the responsibilities and job description for the Senior HPC Engineer position at tamus?
Job TitleSenior HPC Engineer Agency Texas A&M University Department Technology Services - IT Enterprise Operations Proposed Minimum Salary$11,666.67 monthly Job LocationCollege Station, Texas Job TypeStaff Job Description Here’s a Glimpse of the Job We are making a bold leap into the future of artificial intelligence with a $45 million investment in an NVIDIA DGX SuperPOD. This investment underscores our commitment to all Texas A&M System members’ faculty and staff providing cutting-edge research and super computing needs. As a Senior High Performance Computing Engineer (HPC), you will provide technical expertise and consultation for the design and deployment of HPC systems. Get in on the ground floor with a team that is shaping the next generation of innovation. This position is security sensitive requiring U.S. Citizenship. Opportunities to Contribute Manage large-scale HPC cluster operations, including OS upgrades, firmware patching, and performance tuning. Oversee networking, security, and infrastructure for HPC systems. Lead the development of specialized HPC computing clouds and scalable storage systems. Collaborate with stakeholders to develop service-based solutions. Serve as a strategic technical resource across departments. Lead enterprise-wide HPC projects using established project management protocols. Mentor junior system administrators and enforce performance standards. What you need to know Salary: $130-140k annually Location: In-person role in College Station, Texas Schedule: This role may require working outside of standard office hours, including evenings, weekends, and holidays, to support the demands of technology services and ensure the seamless operation of essential systems. Citizenship: Must be a United States citizen, permanent resident, or a person granted asylum or refugee status in accordance with 15 CFR, Part 762; 22 CFR §§122.5, 123.22 and 123.26; and 31 CFR § 501.601 Qualifications Bachelor’s degree in applicable field or equivalent combination of education and experience 12 years of related experience A well-qualified candidate should possess one or more of the following: Experience with High Performance Computing (HPC) environments Advanced Linux system administration skills Familiarity with computer networking concepts and protocols Experience with container orchestration tools such as Kubernetes Knowledge of Run:ai for AI workload management Proficiency with Slurm workload manager Experience working with NVIDIA DGX systems Understanding of virtualization technologies Familiarity with Infrastructure as a Service (IaaS) platforms Experience with DDN storage solutions Knowledge of network-attached storage systems Knowledge, Skills, and Abilities: Expertise in scalable supercomputing architectures, interconnects, and storage systems. Proficiency in scripting (Python, Bash, Perl) and scientific computing (MPI, OpenMP, CUDA). Experience with configuration management tools (Ansible, Puppet). Familiarity with container technologies (Docker, Singularity, Kubernetes). Strong troubleshooting, communication, and strategic planning skills. Other Requirements and Factors: This role may require working outside of standard office hours, including evenings, weekends, and holidays, to support the demands of technology services and ensure the seamless operation of essential systems. This position is security sensitive. This position requires compliance with state and federal laws/codes and Texas A&M University System/TAMU policies, regulations, rules and procedures. All tasks and job responsibilities must be performed safely without injury to self or others in compliance with System and University safety requirements. This position is security sensitive. All positions are security-sensitive. Applicants are subject to a criminal history investigation, and employment is contingent upon the institution’s verification of credentials and/or other information required by the institution’s procedures, including the completion of the criminal history check. Equal Opportunity/Veterans/Disability Employer. Howdy and thank you for your interest in a career with Texas A&M University. As the flagship campus of The Texas A&M University System, we are located in College Station, Texas with a student population of more than 74,000 and nearly 14,000 faculty and staff. The Spirit of Aggieland is unmistakable. We are a unique American institution, fostering a culture of friendliness, compassion and respect for one another. Our unique history and rich traditions make Texas A&M special. From our benefits package and professional development opportunities to our retirement programs, Texas A&M is a great place to work. Your path to a great career starts here! Equal Opportunity/Veterans/Disability Employer. If you need assistance in applying for this job, please contact (979) 845-5154. Useful Links: Benefit Programs Retirement Employee Discount Program Flexible Spending Accounts University Holidays Legal Statements New Employee Onboarding Prospective Employees Safety and Security Notices Training and Development USERRA Nondiscrimination Notice