AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Reinforcement Learning From Human Feedback
In the next step that we'll describe in the process, humans are taken back out of the loop to fine-tune the model. But they're that central piece, so this three-step process of starting with a language model in one end and ending with a reinforcement learning trained model on the other end. So hopefully there's some diagrams in the post that I think are quite helpful. It's a bit hard on a podcast, but hope it makes some sense.