The Limits of Transparency in Language Models

The GPT-4 paper says that they believe it's a matter of safety to not disclose anything about the training data. In order to be able to use this safely, we need to know what's in it. The way they're getting this air's-outs fluency is by taking everything they can possibly get their hands on. That strategy leads to these data sets that are probably too big to document thoroughly.

Play episode from 40:23

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app