What are the responsibilities and job description for the Engineering Manager, Evaluation Platform position at ChatGPT Jobs?
Job Description
Engineering Manager, Evaluation Platform
Location: Austin, TX
Reports to: Sr Director, Procore AI Engineering
Machine Learning & Artificial Intelligence
Job Summary
Build infrastructure and tooling to measure, benchmark, and improve the quality of AI agents (Search Agent, RFI Create Agent, Invoice Agent, etc.). Own end-to-end evaluation lifecycle: defining quality metrics, building evaluation frameworks, and delivering interfaces for actionable insights.
What You'll Do
Base Pay Range: $168,560.00 - $231,770.00 USD Annual
Machine Learning & Artificial Intelligence
Eligible for Equity Compensation and/or Bonus Incentive Compensation. Actual compensation based on job-related skills, experience, education/training, and location.
For Los Angeles County (unincorporated) Candidates: Procore will consider for employment all qualified applicants, including those with arrest or conviction records, in accordance with applicable laws.
Engineering Manager, Evaluation Platform
Location: Austin, TX
- On-site (2 days per week hybrid in Austin office)
Reports to: Sr Director, Procore AI Engineering
Machine Learning & Artificial Intelligence
Job Summary
Build infrastructure and tooling to measure, benchmark, and improve the quality of AI agents (Search Agent, RFI Create Agent, Invoice Agent, etc.). Own end-to-end evaluation lifecycle: defining quality metrics, building evaluation frameworks, and delivering interfaces for actionable insights.
What You'll Do
- Lead and grow a team of engineers focused on evaluation infrastructure, quality measurement, and developer tooling for AI agents.
- Define technical vision and roadmap for the Evaluation Platform (offline evaluations and online evaluations).
- Partner with AI/ML, Product, and Agent teams to define quality metrics (relevance, accuracy, latency, safety, user satisfaction, token usage) and build automated pipelines.
- Design and deliver user-facing evaluation tools for assessing agent output quality, comparing model versions, and identifying regressions.
- Build frameworks for human-in-the-loop evaluation (annotation workflows, rating interfaces, inter-rater reliability).
- Establish CI/CD quality gates for agent version releases.
- Drive engineering excellence (code quality, system reliability, test coverage, on-call health, technical debt management).
- Recruit, mentor, and develop engineers, fostering a culture of ownership and rigorous experimentation.
- 5 years managing engineering teams or as a technical lead, with 7 years total in software engineering.
- Experience building evaluation, quality measurement, or observability platforms for LLM-based or agentic systems (RAG pipelines, multi-step agents, tool-use agents).
- Strong understanding of evaluation methodologies (precision/recall, LLM-as-judge, human annotation, A/B testing, statistical significance).
- Proven ability to translate ambiguous problem spaces into clear technical strategies and executable roadmaps.
- Hands-on technical depth in backend systems, data pipelines, or distributed infrastructure (Python, Go, or similar).
- Familiarity with evaluation frameworks such as RAGAS, DeepEval, LangFuse, or custom eval harnesses.
- Background in search relevance (NDCG, MRR) or information retrieval quality systems.
- Experience with construction-tech, procurement, or enterprise B2B SaaS domains (preferred).
Base Pay Range: $168,560.00 - $231,770.00 USD Annual
Machine Learning & Artificial Intelligence
Eligible for Equity Compensation and/or Bonus Incentive Compensation. Actual compensation based on job-related skills, experience, education/training, and location.
For Los Angeles County (unincorporated) Candidates: Procore will consider for employment all qualified applicants, including those with arrest or conviction records, in accordance with applicable laws.
Salary : $168,560 - $231,770