What are the responsibilities and job description for the Machine Learning Engineer, Model Evaluations (Speech LLM) - San Francisco position at ChatGPT Jobs?
Job Description
Job Description - Plaud Inc.
Speech Evaluation Engineer (Speech LLM)
Company: Plaud Inc.
Location: San Francisco, CA
Type: On-site (Hybrid: Minimum 3x in-office per week)
Machine Learning & Artificial Intelligence
Job Overview
Plaud is seeking a candidate to turn ambiguous concepts like voice naturalness and cadence into clear, automated metrics. You will partner with ML researchers to define benchmarks for Speech LLMs, build scalable data pipelines, and own dashboards that track model health and performance.
Key Responsibilities
Job Description - Plaud Inc.
Speech Evaluation Engineer (Speech LLM)
Company: Plaud Inc.
Location: San Francisco, CA
Type: On-site (Hybrid: Minimum 3x in-office per week)
Machine Learning & Artificial Intelligence
Job Overview
Plaud is seeking a candidate to turn ambiguous concepts like voice naturalness and cadence into clear, automated metrics. You will partner with ML researchers to define benchmarks for Speech LLMs, build scalable data pipelines, and own dashboards that track model health and performance.
Key Responsibilities
- Define and automate metrics for subjective concepts such as naturalness, expressiveness, and conversational cadence.
- Build reliable distributed systems and data pipelines that run at scale against live model checkpoints.
- Partner with ML researchers to translate Speech LLM capabilities (e.g., ASR robustness, TTS emotional steerability) into measurable benchmarks.
- Develop and own dashboards to track model health during training, improve signal-to-noise ratios, and reduce evaluation latency.
- Debug anomalous mid-training results to identify root causes (architecture, data, or infrastructure).
- Communicate complex statistical results and model behaviors to technical and non-technical stakeholders.
- Engineering Skills: Strong software engineering skills, particularly in Python, with experience in distributed systems and evaluation harnesses.
- ML Collaboration: Ability to deeply partner with researchers to define "good" performance for AI models.
- Observability: Experience building trusted tracking dashboards (e.g., Weights & Biases, MLflow).
- Communication: Ability to clearly articulate complex statistical results.
- Speech Metrics: Familiarity with WER, CER, PESQ, and automated MOS scoring frameworks.
- LLM-as-a-Judge: Experience using frontier or fine-tuned multi-modal LLMs to evaluate conversational logic, transcription accuracy, and audio quality.
- Human Evaluation: Background in managing large-scale crowdsourcing for RLHF/DPO efforts.
- Adversarial Datasets: Experience curating datasets to test edge cases (heavy accents, overlapping speech, noisy environments).
- Salary: $180,000 - $270,000 base salary performance bonus Equity.
- Healthcare: Top-tier healthcare (employee dependents) including dental and vision.
- Retirement: 401(k) with company matching.
- Time Off: Unlimited PTO plus 13 paid holidays.
- Parental Leave: 12 weeks of paid leave for all new parents.
- Equipment: Choice of top-of-the-line laptops/workstations.
- Perks: Annual offsites and fully stocked office.
Salary : $180,000 - $270,000