Towards Data Science cover image

96. Jan Leike - AI alignment at OpenAI

Towards Data Science

00:00

Reenforcementing From Human Feedback

In training, its trying to just predict the next token. How would you shift from that kind of framework to it's trying to actually do the thing that it's being asked to do? Sense? Ye, we're not actually changing the gib three training procedure. What we're doing is like, retaining the, retaking the trained model, and then we get fine taning it.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app