AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Reinforcement Learning and the Chat JPT System
John Schulman gave a talk about the chat JPT system and the way he presented it, I'm curious about your take on this. His take was you train on the entire internet, the pre-training, and the purpose of that is for the model to essentially know everything that's out there but now it has no clue what actually matters. And then the fine tuning tells it not any new knowledge, but just of everything that you already know,. Use these things when you have a conversation. Does that resonate with you? Does it seem different to you?I think it's a really good framing of what's happening through reinforcement learning. Another way to maybe say the same thing,