Why value functions fell out of favor

John observes value functions currently give little benefit on modern RL-from-human-feedback tasks despite variance-reduction theory.

Play episode from 16:57

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!