How to Improve Language Models Through Value-Based Reinforcement

A lot of the ways that people do RL with language models now treats the language models task as a one step problem. But if we're thinking about counterfactuals, that is typically situated in a multi-step process. So I think there's actually a lot of potential to get much more powerful language models with appropriate value-based reinforcement.

Play episode from 52:30

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app