Latent Space: The AI Engineer Podcast cover image

The RLVR Revolution — with Nathan Lambert (AI2, Interconnects.ai)

Latent Space: The AI Engineer Podcast

00:00

Navigating Over-Optimization in RL

This chapter explores over-optimization in reinforcement learning, highlighting three key types: RL control, RLHF, and RLVR. It delves into the complexities of reward design, illustrating how poorly structured rewards can lead to unintended model behaviors, especially in mixed domains like coding and mathematics.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app