Jeffrey Toobin: The pre training is where you can kind of just pop up anywhere in like the full history of the Internet. And it's just like a one giant run on sentence, you know, and so I think that's really a blog post about your favorite things Detroit by Tyler Cowan because that wasn't framed as something it could auto complete. So then the instruction tuning comes in and kind of makes that much clearer,. Then the reinforcement learning with the reward model and the sort of feedback dynamic takes that to another level.Toobin: Do you see those as qualitatively different or just kind of more of the same thing like that? Because that this instruction tuning, all right

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode