How to Fine-Tune a Model to Respond to Commands

A model before it's had this sort of specific human feedback to make it respond to commands? Feels a little bit like schizophrenic. It's hard to get it to do what you want. You're trying to give it instructions and it might break in some very unpredictable way. And then what exactly is happening in that fine-tuning phase with the human feedback? What exactly are you doing?"

Play episode from 19:13

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app