Generally Intelligent cover image

Episode 28: Sergey Levine, UC Berkeley, on the bottlenecks to generalization in reinforcement learning, why simulation is doomed to succeed, and how to pick good research problems

Generally Intelligent

00:00

How to Improve Language Models Through Value-Based Reinforcement

A lot of the ways that people do RL with language models now treats the language models task as a one step problem. But if we're thinking about counterfactuals, that is typically situated in a multi-step process. So I think there's actually a lot of potential to get much more powerful language models with appropriate value-based reinforcement.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app