Is the Human in the Loop Aspect of Training Important?

DaVinci 3 was the first commercial model that had RLHF on top. Most of this sort of proxy for pragmatics in how DaVinci 1 and 2 and models of that instruction following class are trained. It comes from just being trained on a different kind of data, right? And so evidence shows that at least when it comes to resolving these sort of binary conversational implicatures, that's a very strong signal that's astounding. But it tells us there's a lot to be gained from thinking about just what kind of data we throw at them rather than just thinking about more data.

Play episode from 24:27

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app