TalkRL: The Reinforcement Learning Podcast cover image

John Schulman

TalkRL: The Reinforcement Learning Podcast

00:00

What Language Models Can Learn From the Internet?

The model is basically an ensemble of all these people who wrote stuff on the internet. When you feed it a prompt, what it's doing internally has to be something like figuring out who wrote the first and then trying to continue in that style. It forces the model to determine if things are true or not. I think for RL fine tuning, there's a lot more potential for the model to output something truthful as opposed to trying to imitate a certain style.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner