Cursor cover image

John Schulman on dead ends, scaling RL, and building research institutions

Cursor

00:00

Why value functions fell out of favor

John observes value functions currently give little benefit on modern RL-from-human-feedback tasks despite variance-reduction theory.

Play episode from 16:57
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app