Reinforcement Learning and the Chat JPT System

John Schulman gave a talk about the chat JPT system and the way he presented it, I'm curious about your take on this. His take was you train on the entire internet, the pre-training, and the purpose of that is for the model to essentially know everything that's out there but now it has no clue what actually matters. And then the fine tuning tells it not any new knowledge, but just of everything that you already know,. Use these things when you have a conversation. Does that resonate with you? Does it seem different to you?I think it's a really good framing of what's happening through reinforcement learning. Another way to maybe say the same thing,

Play episode from 07:25

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app