Founders in Arms cover image

How AI will change programming with Amjad Masad, CEO of Replit

Founders in Arms

00:00

Reinforcement Learning From Human Feedback

Reinforcement learning is like, you know, when we're playing games in the 90s, a lot of the AI is there. We're like trained via some kind of reinforcement learning. Bunchment and RL from human feedback is basically someone using chat. You have thumbs up thumbs down. And so you have a group of people doing that and then you take all their data and responses and you generate a policy from it. A reward policy. Then you train the large language model using reinforcement learning using that reward policy. And then it starts behaving a little more like what a human prefers it to behave.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner
Get the app