What are the responsibilities and job description for the Title: Supervisor - Server Repair Engineering position at Method360 Talent Acquisition?
Title: Supervisor - AI Server Repair Engineering
Location: Grapevine, TX
Position Type: Full Time Employment
$55000 - $60000 per year Benefits Bonus
**Green Card of US Citizenship required – Client is unable to sponsor Visa’s at this time
Position Overview
- We are seeking a senior engineering leader to serve as the Supervisor of AI Server Repair Engineering & Process.
- This is a foundational role responsible for architecting, defining, and continuously improving the entire technical framework for diagnosing and repairing our complex, high-value AI server infrastructure.
- More than a traditional supervisor, you are the lead repair engineer and process owner.
- You will leverage your deep hardware expertise to develop systematic, data-driven, and scalable repair processes from the ground up.
- You will not only lead a team of technicians and junior engineers but also act as their primary technical mentor and the engineering liaison to our core Product Design and Quality teams.
- Your mission is to transform our repair facility into a center of excellence by embedding engineering discipline into every aspect of our service operations.
Key Responsibilities
1. Process Architecture & Definition (Primary Focus):
* Architect and Author: Design, document, and deploy the end-to-end technical workflow for AI server repair. This includes creating detailed Standard Operating Procedures (SOPs), diagnostic flowcharts, decision trees, and work instructions.
* Test Plan Development: Define and validate comprehensive test plans and validation criteria for all repaired components and full systems, ensuring they meet strict performance and reliability standards before being returned to service.
* Tooling & Automation: Identify, develop, and implement diagnostic scripts, software tools, and physical fixtures to improve the accuracy, consistency, and efficiency of the troubleshooting and repair process.
* Process Control: Establish critical control points within the repair process to ensure quality and gather vital failure data.
2. Advanced Engineering Support & Failure Analysis (Primary Focus):
* Technical Authority: Serve as the ultimate escalation point for the most complex hardware failures that elude standard diagnostic procedures.
* Root Cause Analysis (RCA): Lead systematic deep dives into new and recurring failure modes. Perform board-level analysis, interpret schematics, and collaborate with the team to isolate the root cause.
* Engineering Feedback Loop: Act as the primary technical interface between the repair center and core Hardware Engineering/R&D. Consolidate, analyze, and present failure data and RCA findings to influence future product design for improved serviceability and reliability (Design for Serviceability).
3. Operational Leadership & Team Enablement:
* Technical Mentorship: Lead and develop the technical capabilities of the repair team. Provide hands-on training on new products, advanced diagnostic techniques, and established repair processes.
* Enablement, Not Just Delegation: Empower the team by ensuring they have the processes, tools, and knowledge required to succeed. Focus on removing technical roadblocks and fostering an environment of structured problem-solving.
* Performance Management: Set clear technical objectives, manage workflow priorities based on engineering needs, and guide the professional growth of team members.
4. Data-Driven Continuous Improvement:
* Analyze Repair Data: Systematically collect and analyze repair data (failure modes, component usage, test yields) to identify trends and opportunities for process optimization.
* Drive Improvements: Initiate and lead engineering change requests (ECRs) and process improvement projects based on data analysis to enhance repair quality, reduce turn-around time, and lower costs.
Required Qualifications (Must-Haves):
Education:
- Bachelor’s degree in Electrical Engineering, Computer Engineering, Manufacturing Engineering, or a closely related field.
Experience:
- 4 years in a technical engineering role such as Test Engineering, Manufacturing Engineering, Hardware Sustaining, or high-level Repair Engineering.
- 4 years in a technical engineering role such as Test Engineering, Manufacturing Engineering, Hardware Sustaining, or high-level Repair Engineering.
- Proven track record of developing and documenting technical processes (SOPs, test plans, work instructions) from scratch in a manufacturing or repair environment.
- 3 years in a technical leadership role, mentoring junior engineers or technicians.
Technical Expertise:
- Expert-level ability to read and interpret electronic schematics, board layout files, and product specifications.
- Strong, hands-on experience with systematic hardware troubleshooting methodologies for complex systems (e.g., servers, networking equipment).
- Demonstrated proficiency in scripting (Python, Bash, or similar) to automate diagnostic tests and parse data logs.
- Deep knowledge of server components and architecture, including GPUs, high-speed interconnects (InfiniBand/Ethernet), CPUs, and power systems.
Preferred Qualifications (Nice-to-Haves):
- Master’s degree in Electrical or Computer Engineering.
- Experience with Design for Manufacturability (DFM) or Design for Serviceability (DFS) principles.
- Certification and practical application of Lean Manufacturing or Six Sigma methodologies.
- Experience with analyzing failure and yield data.
- Hands-on experience with board-level repair techniques (e.g., soldering, BGA rework) is a strong plus.
Salary : $55,000 - $60,000