What are the responsibilities and job description for the Applied Reinforcement Learning Engineer position at Centific?
Applied Reinforcement Learning Engineer
Location: Palo Alto, CA or Seattle, WA (Hybrid/Remote)
Salary: $150K – $300K Annually
About Centific
Centific is a frontier AI data foundry that curates diverse, high-quality data, using our purpose-built technology platforms to empower the Magnificent Seven and our enterprise clients with safe, scalable AI deployment. Our team includes more than 150 PhDs and data scientists, along with 4,000 AI practitioners and engineers, and an integrated ecosystem of 1.8 million vertical domain experts across 230 markets. Our zero-distance innovation™ solutions for GenAI can reduce GenAI costs by up to 80% and bring solutions to market 50% faster.
About the Team
Centific AI Research advances foundational AI models and applications through reinforcement learning, alignment, and human-centered intelligence. We're building governed simulation environments that let enterprises safely iterate and improve AI agent workflows — bridging human-labeled signal creation with automated post-training for high-stakes operations.
The Role
You'll build simulation environments that mirror real enterprise workflows and post-train LLM agents inside them. Your environments, reward functions, and verifiers become the training ground for production agents handling document processing, compliance, customer operations, and multi-step reasoning across regulated industries.
This role sits at the intersection of LLM post-training research and production engineering. You'll translate customer workflows into bespoke environments, design reward signals that hold up under optimization pressure, and ship pipelines that turn human-labeled traces into measurable agent improvements.
What You'll Do
- Design simulation environments and digital twins for enterprise workflows
- Post-train LLM agents using the right method for the task — RLHF, DPO, GRPO, PPO, and whatever comes next
- Build pipelines that turn human-labeled traces and verifiable signals into training data
- Architect multi-turn, tool-using agents with closed learning loops
- Design reward functions and verifiers that resist reward hacking and reflect real task outcomes
- Translate research into production; contribute to publications
Required Qualifications
- 3 years fine-tuning LLMs, with hands-on experience in RL post-training
- Experience building or training LLM-based agents — tool use, multi-turn reasoning, trajectory evaluation
- Strong Python and software engineering skills; comfortable building pipelines, not just notebooks
- Working knowledge of modern post-training and rollout-serving libraries
- MS/PhD in CS, ML, or related field, or equivalent industry experience
Preferred Qualifications
- Publications at NeurIPS, ICML, ICLR, ACL, COLM, or similar venues
- Open-source contributions to post-training or agent frameworks (TRL, veRL, OpenRLHF, SkyRL, or similar)
- Background in classical RL
- Domain experience in healthcare, finance, logistics, or compliance
- Experience with synthetic data generation, simulation, or world models
- Distributed training experience
Why Join Centific
- Lead the frontier. Shape a new discipline at the intersection of post-training, simulation, and enterprise AI
- Ship your science. See your research power real systems across healthcare, finance, and safety-critical operations
- Collaborate with leaders. Work alongside NVIDIA, Microsoft, and the global AI community
- Build what matters. Create governed, compliant AI systems enterprises can actually trust
Learn more about us at centific.com.
Centific is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, citizenship status, age, mental or physical disability, medical condition, sex (including pregnancy), gender identity or expression, sexual orientation, marital status, familial status, veteran status, or any other characteristic protected by applicable law. We consider qualified applicants regardless of criminal histories, consistent with legal requirements.
Salary : $150,000 - $300,000