OpenAI's Cage About Disclosing Training Data

In general, proving that a model was trained on a certain kind of data is pretty difficult. And I think that's actually probably why OpenAI is now super cagey about saying exactly what they train on. So us academics were like, we demand like we for good science and very good reasons for that. But there's also some really interesting tests that some people have released on trying to figure out whether or not some items exist in the data set.

Play episode from 20:30

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app