What are the responsibilities and job description for the AI/ML Research Scientist, LLM Post-Training & Evaluation position at ChatGPT Jobs?
Job Description
Job Description
Research Scientist, LLM Evaluation & Post-Training
Company: Centific
Location: Palo Alto, CA or Seattle, WA (Hybrid/Remote)
Type: Full-time
Salary: $150K - $160K Annually
About The Role
Centific is seeking a Research Scientist focused on LLM evaluation and post-training. This role involves defining and executing research agendas, developing evaluation frameworks, analyzing model behavior, and collaborating with cross-functional teams and customer stakeholders. The goal is to improve LLM evaluation methodologies and drive advancements in AI deployment.
Key Responsibilities
Job Description
Research Scientist, LLM Evaluation & Post-Training
Company: Centific
Location: Palo Alto, CA or Seattle, WA (Hybrid/Remote)
Type: Full-time
Salary: $150K - $160K Annually
About The Role
Centific is seeking a Research Scientist focused on LLM evaluation and post-training. This role involves defining and executing research agendas, developing evaluation frameworks, analyzing model behavior, and collaborating with cross-functional teams and customer stakeholders. The goal is to improve LLM evaluation methodologies and drive advancements in AI deployment.
Key Responsibilities
- Define and execute research on LLM evaluation and post-training.
- Develop and validate comprehensive evaluation frameworks for LLM and multimodal systems.
- Lead research in frontier evaluation domains (long-context, cross-modal, dynamic multi-turn).
- Analyze model behavior and provide recommendations for improvement.
- Collaborate with data scientists and ML engineers on evaluation and training pipelines.
- Engage with customer technical stakeholders to understand evaluation goals and provide recommendations.
- Contribute to knowledge creation through datasets, frameworks, reports, and publications.
- Promote thought leadership in LLM evaluation and post-training.
- MS or PhD in Computer Science, Machine Learning, Statistics, Applied Mathematics, AI, or related quantitative field (PhD preferred).
- 5 years of relevant experience in applied ML research, with substantial work in LLMs or foundation models.
- Demonstrated experience with LLM evaluation, benchmarking, alignment, post-training, or model quality research.
- Strong foundation in experimental design, statistical analysis, and scientific reasoning for ML systems.
- Strong Python coding skills for research, data processing, and ML frameworks (PyTorch, Hugging Face, JAX/TensorFlow).
- Ability to evaluate and compare human and automated evaluation methods.
- Strong written and verbal communication skills.
- Hands-on experience with fine-tuning or post-training experiments (SFT, preference optimization, RLHF/RLAIF).
- Experience with multimodal and long-context evaluation.
- Experience designing multi-turn, interactive, or agentic evaluation protocols.
- Publications or open-source contributions in LLM evaluation at top venues.
- Experience in customer-facing applied research or technical consulting.
- Familiarity with safety, trustworthiness, and governance in GenAI evaluation.
Salary : $150,000 - $160,000