What are the responsibilities and job description for the Member of Technical Staff - RL Algorithms position at Vmax?
About Vmax
Vmax is an applied research lab developing AI capable of open-ended learning. We are building systems to exceed humans in all capacities by optimising beyond the local maxima of learning from human expertise.
About the roleRL has become the de-facto method of post-training LLMs. We are limited by the sample efficiency of the current policy gradient algorithms in use today, and are looking for a talented researcher to weave together pre-LLM and post-LLM approaches to learning from experience.
Responsibilities- Develop new RL algorithms for post-training language models.
- Adapt ideas from pre-LLM reinforcement learning, such as model-based RL, temporal abstraction, and value-based learning, to modern LLM and agentic settings.
- Establish empirical baselines and evaluation protocols for measuring sample efficiency, robustness, generalization, and reward exploitation in LLM RL.
- Analyze failure modes of RL-trained models, including reward hacking, mode collapse, over-optimization, exploration failures, and distribution shift.
- Collaborate with researchers working on environments, evals, interpretability, reward modeling, and infrastructure to turn algorithmic ideas into reliable training systems.
- Own and develop a research agenda within Vmax, from identifying promising directions to executing experiments and communicating results.
- PhD or equivalent experience in machine learning, reinforcement learning, or a closely related field.
- Track record of research excellence, as demonstrated by publications, open source work, deployed AI systems, or other substantial technical contributions.
- Deep understanding of modern machine learning, especially reinforcement learning, representation learning, and large language models.
- Strong familiarity with LLM post-training methods.
- Experience designing and running rigorous ML experiments, including ablations, baselines, evaluation design, and failure analysis.
- Experience with large-scale ML infrastructure, distributed training, experiment tracking, data pipelines, and debugging unstable training runs.
- Expertise with Python and at least one major ML framework such as PyTorch or JAX.
- Ability to work independently on open-ended research problems and turn ambiguous ideas into concrete experimental programs.
- Experience developing new RL algorithms or improving existing ones in domains such as robotics, games, simulated control, language models, or agents.
- Experience with LLM pre-training.
- Strong understanding of reward modeling, verifiers, process supervision, outcome supervision, or automated evaluation systems.
- Demonstrated software engineering ability
- Strong communication skills, especially the ability to explain algorithmic ideas, empirical results, and research implications to both technical and non-technical audiences