
John Schulman
TalkRL: The Reinforcement Learning Podcast
00:00
What Language Models Can Learn From the Internet?
The model is basically an ensemble of all these people who wrote stuff on the internet. When you feed it a prompt, what it's doing internally has to be something like figuring out who wrote the first and then trying to continue in that style. It forces the model to determine if things are true or not. I think for RL fine tuning, there's a lot more potential for the model to output something truthful as opposed to trying to imitate a certain style.
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.