What are the responsibilities and job description for the Senior GPU Platform Engineer with AI Infrastructure Operations position at Jobs via Dice?

Dice is the leading career destination for tech experts at every stage of their careers. Our client, WB Solutions LLC, is seeking the following. Apply via Dice today!

Greetings from WB Solutions....!

We have the Urgent requirement for Senior GPU Platform Engineer - AI Infrastructure Operations at Redmond, WA - Long-Term Contract.

Role: Senior GPU Platform Engineer - AI Infrastructure Operations

Location: Redmond, WA onsite 4 days a week

NOTE: LINKEDIN IS MUST

MUST HAVE SKILLS:

Configuration Management

GPGPU/GPU

Hardware Troubleshooting

Infrastructure & Operations

Infrastructure Automation and Orchestration

Linux Administration

Description:

Join our team to operate and support cutting-edge GPU infrastructure powering AI and high-performance computing workloads for a leading global hyperscale cloud provider. In this hands-on role, you'll manage the full lifecycle of NVIDIA GPU platforms from bring-up to break/fix while ensuring optimal performance for advanced AI applications.
At EPAM, you'll work on cutting-edge technologies, solve complex challenges, and shape the future of digital innovation. With access to continuous learning, mentorship, and global projects, your expertise will drive meaningful change.

Responsibilities:

Operate and maintain production GPU and bare-metal compute platforms with hands-on hardware management
Perform physical infrastructure tasks including rack/stack, cabling, power validation, and system bring-up
Diagnose hardware faults, replace failed components, and coordinate vendor support for complex issues
Install and configure Linux operating systems with GPU-specific drivers and software stacks
Execute platform validation using diagnostic tools to ensure GPU health, stability, and performance
Provision bare-metal systems through automated workflows while troubleshooting configuration issues
Apply firmware, BIOS, and platform configuration changes following standardized change processes

Requirements:

5 years professional experience supporting production server infrastructure in data center environments
Strong Linux administration skills with ability to independently troubleshoot system-level issues
Hands-on experience with physical server hardware including diagnostics and component replacement
Familiarity with GPU platforms, preferably NVIDIA, and associated drivers and software stacks
Experience working in structured, change-controlled production environments
Knowledge of infrastructure monitoring tools and alert response procedures
Excellent communication skills with ability to collaborate across operations and engineering teams

Location: On-site position in the Greater Seattle/Redmond area requiring regular hands-on access to hardware in lab or data center environments.

Apply for this job

Receive alerts for other Senior GPU Platform Engineer with AI Infrastructure Operations job openings

Senior GPU Platform Engineer with AI Infrastructure Operations

What are the responsibilities and job description for the Senior GPU Platform Engineer with AI Infrastructure Operations position at Jobs via Dice?

What is the career path for a Senior GPU Platform Engineer with AI Infrastructure Operations?

Job openings at Jobs via Dice

Not the job you're looking for? Here are some other Senior GPU Platform Engineer with AI Infrastructure Operations jobs in the Redmond, WA area that may be a better fit.

We don't have any other Senior GPU Platform Engineer with AI Infrastructure Operations jobs in the Redmond, WA area right now.

AI Assistant is available now!