What are the responsibilities and job description for the Member of Technical Staff (ML Infrastructure) position at Stott and May?
Job Description
We’re partnering with a cutting-edge AI startup building next-generation infrastructure to power large-scale, intelligent systems. Their mission is to bridge the gap between world-class AI research and production-grade deployment - enabling faster experimentation, high-performance inference, and reliable large-scale training.
As a Member of Technical Staff (ML Infrastructure), you’ll design and scale the systems that keep state-of-the-art AI running - from distributed training clusters and inference engines to agentic frameworks and post-training pipelines. You’ll work alongside a small, elite team of researchers and engineers who move fast, think big, and take full ownership of their work.
What You’ll Do
We’re partnering with a cutting-edge AI startup building next-generation infrastructure to power large-scale, intelligent systems. Their mission is to bridge the gap between world-class AI research and production-grade deployment - enabling faster experimentation, high-performance inference, and reliable large-scale training.
As a Member of Technical Staff (ML Infrastructure), you’ll design and scale the systems that keep state-of-the-art AI running - from distributed training clusters and inference engines to agentic frameworks and post-training pipelines. You’ll work alongside a small, elite team of researchers and engineers who move fast, think big, and take full ownership of their work.
What You’ll Do
- Design, build, and optimize high-performance ML infrastructure for large-scale training, inference, and evaluation.
- Develop and maintain distributed systems that power large compute clusters and AI networking.
- Streamline research workflows and accelerate experimentation by improving data pipelines (data collection, loading, SFT, RL).
- Enhance inference performance across both open-source and proprietary inference engines.
- Establish strong engineering practices for observability, reliability, and scalability.
- Collaborate with researchers and product teams to translate cutting-edge ideas into robust, production-ready systems.
- Deep expertise in one or more of the following: inference optimization, GPU performance, cluster scheduling, or large-scale infrastructure.
- Strong experience with modern ML frameworks (e.g., PyTorch, vLLM, Verl).
- Startup-ready mindset - high ownership, adaptability, and comfort working in fast-moving environments.
- Passion for bridging research and real-world impact.
- High impact: You’ll ship meaningful work in weeks, not months.
- Elite team: Work alongside ex-founders, top AI researchers, and engineers from leading tech companies.
- Momentum: Well-funded, fast-growing, and laser-focused on building and shipping real products powered by cutting-edge AI.