The Human Refinement Learning Paradigm

The big break to our advance in the last few years is this sort of like instruct GPT thing. You train your model on all the data on the internet, and then you come up with something that's not aligned. And so when you go through a fine tuning or an alignment or an instruction tuning phase or whatever you want to call it, where like you give it lots of examples of like good behavior and bad behavior,. Then you like adjust the model weight. This is the human reinforcement learning.

Play episode from 19:08

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app