What are the responsibilities and job description for the Engineering Manager, Evaluation Platform position at ChatGPT Jobs?

Job Description

Engineering Manager, Evaluation Platform

Location: Austin, TX

On-site (2 days per week hybrid in Austin office)

Company: Procore (Construction Intelligence organization)

Reports to: Sr Director, Procore AI Engineering

Machine Learning & Artificial Intelligence

Job Summary

Build infrastructure and tooling to measure, benchmark, and improve the quality of AI agents (Search Agent, RFI Create Agent, Invoice Agent, etc.). Own end-to-end evaluation lifecycle: defining quality metrics, building evaluation frameworks, and delivering interfaces for actionable insights.

What You'll Do

Lead and grow a team of engineers focused on evaluation infrastructure, quality measurement, and developer tooling for AI agents.
Define technical vision and roadmap for the Evaluation Platform (offline evaluations and online evaluations).
Partner with AI/ML, Product, and Agent teams to define quality metrics (relevance, accuracy, latency, safety, user satisfaction, token usage) and build automated pipelines.
Design and deliver user-facing evaluation tools for assessing agent output quality, comparing model versions, and identifying regressions.
Build frameworks for human-in-the-loop evaluation (annotation workflows, rating interfaces, inter-rater reliability).
Establish CI/CD quality gates for agent version releases.
Drive engineering excellence (code quality, system reliability, test coverage, on-call health, technical debt management).
Recruit, mentor, and develop engineers, fostering a culture of ownership and rigorous experimentation.

What We're Looking For

5 years managing engineering teams or as a technical lead, with 7 years total in software engineering.
Experience building evaluation, quality measurement, or observability platforms for LLM-based or agentic systems (RAG pipelines, multi-step agents, tool-use agents).
Strong understanding of evaluation methodologies (precision/recall, LLM-as-judge, human annotation, A/B testing, statistical significance).
Proven ability to translate ambiguous problem spaces into clear technical strategies and executable roadmaps.
Hands-on technical depth in backend systems, data pipelines, or distributed infrastructure (Python, Go, or similar).
Familiarity with evaluation frameworks such as RAGAS, DeepEval, LangFuse, or custom eval harnesses.
Background in search relevance (NDCG, MRR) or information retrieval quality systems.
Experience with construction-tech, procurement, or enterprise B2B SaaS domains (preferred).

Compensation & Benefits

Base Pay Range: $168,560.00 - $231,770.00 USD Annual

Machine Learning & Artificial Intelligence

Eligible for Equity Compensation and/or Bonus Incentive Compensation. Actual compensation based on job-related skills, experience, education/training, and location.

For Los Angeles County (unincorporated) Candidates: Procore will consider for employment all qualified applicants, including those with arrest or conviction records, in accordance with applicable laws.

Salary : $168,560 - $231,770

Apply for this job

Receive alerts for other Engineering Manager, Evaluation Platform job openings

Engineering Manager, Evaluation Platform

What are the responsibilities and job description for the Engineering Manager, Evaluation Platform position at ChatGPT Jobs?

What is the career path for a Engineering Manager, Evaluation Platform?

Job openings at ChatGPT Jobs

Not the job you're looking for? Here are some other Engineering Manager, Evaluation Platform jobs in the Austin, TX area that may be a better fit.

We don't have any other Engineering Manager, Evaluation Platform jobs in the Austin, TX area right now.

AI Assistant is available now!