What are the responsibilities and job description for the Machine Learning Engineer, Speech LLM Training - San Francisco position at ChatGPT Jobs?
Job Description
SpeechLLM Engineer / AI Research Engineer
San Francisco, CA
Company Overview
Plaud is building the world's most trusted AI work companion for professionals. A Delaware-incorporated, San Francisco-based company, Plaud combines hardware and software to amplify human intelligence, serving over 1.5 million users globally. The company is bootstrapped, profitable, and maintains a $250M revenue run rate.
Role Overview
Join the founding team to define the next-gen paradigm for human-AI interaction. You will build and train large-scale audio or speech models (SpeechLLMs) from the ground up, ranging from unified architectures to edge-device optimization. You will own ambiguous problems and drive them directly into production.
Key Responsibilities
SpeechLLM Engineer / AI Research Engineer
San Francisco, CA
- On-site
Company Overview
Plaud is building the world's most trusted AI work companion for professionals. A Delaware-incorporated, San Francisco-based company, Plaud combines hardware and software to amplify human intelligence, serving over 1.5 million users globally. The company is bootstrapped, profitable, and maintains a $250M revenue run rate.
Role Overview
Join the founding team to define the next-gen paradigm for human-AI interaction. You will build and train large-scale audio or speech models (SpeechLLMs) from the ground up, ranging from unified architectures to edge-device optimization. You will own ambiguous problems and drive them directly into production.
Key Responsibilities
- Build and train large-scale audio or speech models (Unified SpeechLLMs, ASR, TTS, Generative Audio).
- Design novel sequence modeling architectures and debug distributed training clusters.
- Traverse the entire stack: from signal processing and raw acoustic representations to foundation model training.
- Optimize large-scale distributed training runs, manage GPU memory, and resolve performance bottlenecks.
- Design and train state-of-the-art neural audio codecs, diffusion models, or flow matching architectures for voice generation.
- Apply RL techniques (RLHF, GRPO) to improve conversational cadence and model steerability.
- Optimize end-to-end inference for real-time cloud streaming using frameworks like vLLM or TensorRT-LLM.
- Manage massive GPU clusters and distributed training frameworks (FSDP, DeepSpeed, Kubernetes).
- Proven track record of building and training large-scale audio/speech models from the ground up.
- Deep expertise in PyTorch or JAX with experience in distributed training.
- Ability to work at the intersection of research and engineering.
- Experience with optimization of large-scale GPU memory and performance.
- Thrives in a fast-paced, high-growth startup environment.
- Experience with Text-based LLMs (pretraining, instruction tuning, RLHF).
- Hands-on experience with Neural Audio Codecs.
- Experience with Generative Architectures (diffusion, flow matching, autoregressive).
- Deep Systems Optimization experience (high-throughput serving frameworks).
- Large-Scale Infrastructure management.
- Salary: $180K - $270K base performance bonus Equity.
- Health: Comprehensive top-tier healthcare (dental, vision) for employees and dependents.
- Retirement: 401(k) with company matching.
- Time Off: Unlimited PTO 13 paid holidays 12 weeks paid New Parent Leave.
- Work Model: Hybrid (Minimum 3x in-office per week in San Francisco).
- Perks: Top-of-the-line laptops/workstations, annual offsites, fully stocked office.
Salary : $180,000 - $270,000