Reenforcementing From Human Feedback

In training, its trying to just predict the next token. How would you shift from that kind of framework to it's trying to actually do the thing that it's being asked to do? Sense? Ye, we're not actually changing the gib three training procedure. What we're doing is like, retaining the, retaking the trained model, and then we get fine taning it.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app