What are the responsibilities and job description for the Senior Windows & SQL Server Administrator HPC / Analytics Platform position at SLK America Inc.?
Job Title
Senior Windows & SQL Server Administrator HPC / Analytics Platform
Job Summary
The role is responsible for owning and independently executing end-to-end operations of a Windows-based analytics and High-Performance Computing (HPC) platform with minimal supervision. The position focuses on maintaining platform stability, performance, security, and supporting migrations across onpremises and cloud environments.
Key Responsibilities
Platform Operations & HPC Administration
- Operate and support a Windows-based analytics platform running on Microsoft SQL Server and Microsoft HPC.
- Administer Microsoft HPC environments, including head nodes, compute nodes, and scheduling services.
- Monitor and manage compute-intensive workloads, ensuring stable and timely execution.
- Troubleshoot job execution issues such as:
- Queue backlogs
- Stuck or failed jobs
- Abnormal CPU utilization on compute nodes
HPC Cluster Health & Maintenance
- Maintain overall HPC cluster health, including:
- Node availability and state management
- Network connectivity and communication
- Perform controlled maintenance activities such as:
- OS patching
- Rolling reboots
- Platform upgrades
Windows Server Administration
- Execute routine Windows Server operations, including:
- OS patching and security updates
- Service monitoring
- Disk, CPU, memory, and network health checks
- Configure performance-sensitive OS settings to support high-demand workloads.
SQL Server Administration
- Administer Microsoft SQL Server instances, covering:
- Backup and restore validation
- Job monitoring and scheduling
- Index and statistics maintenance
- Transaction log and capacity monitoring
Troubleshooting & Support
- Troubleshoot cross-stack issues spanning:
- HPC scheduler and compute nodes
- Windows operating system and services
- SQL Server performance and availability
- Manage service accounts, permissions, and security configurations across platform components.
- Support incident, change, and problem management activities in line with IT service management processes.
Documentation & Coordination
- Maintain and update:
- Operational runbooks
- Health check procedures
- Disaster recovery documentation
- Coordinate with application, infrastructure, and database teams during critical processing windows.
Migration & Modernization Activities
- Plan and execute Windows and SQL Server workload migrations across datacenters, including:
- Dependency analysis
- Cutover coordination
- Post-migration validation
- Assess SQL Server workloads for migration to:
- Azure SQL Database
- Azure SQL Managed Instance
- Azure Virtual Machines
- Identify and remediate compatibility gaps during migration exercises.
- Support data migration, cutover, rollback planning, and post-migration performance stabilization across:
- Onpremises environments
- Cloud platforms
Decommissioning & Post-Migration Support
- Coordinate decommissioning of legacy servers following migrations.
- Update runbooks, monitoring, and support procedures after platform changes.
Required Skills & Experience (Implied from JD)
- Strong experience with Windows Server administration
- Hands-on expertise in Microsoft SQL Server
- Experience supporting Microsoft HPC environments
- Solid understanding of performance tuning, monitoring, and troubleshooting
- Experience with data center and cloud (Azure) migrations
- Ability to work independently and own end-to-end responsibilities
Salary : $110,000