How Chat GPT Was Trained

Chat GPT was trained using a reinforcement learning approach, and other models using this same approach are also pretrained language models./nThe first reward model is trained to take in a prompt and a response and score it like a human would score it according to preference./nThe second reward model is trained to take in a prompt and a response and output a prediction of what a human preference might be on this output.

Play episode from 24:40

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app