Towards Data Science cover image

96. Jan Leike - AI alignment at OpenAI

Towards Data Science

00:00

Reenforcementing From Human Feedback

In training, its trying to just predict the next token. How would you shift from that kind of framework to it's trying to actually do the thing that it's being asked to do? Sense? Ye, we're not actually changing the gib three training procedure. What we're doing is like, retaining the, retaking the trained model, and then we get fine taning it.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner
Get the app