Why Do We Use IID Test Trains From the Same Distribution?

The 20 years ago, machine learning was basically like, look, you've got your test set and your training set. And so long as they're from the same distribution, we're just going to assume that your test data has all the behaviors that you're going to need to worry about. Just make sure you've got good accuracy on your heldout test set. But when we go and deploy a model in the real world, it's pretty unlikely that the data that that model encounters is going to be from exactly the same distribution than happened to be in our limited historical snapshot of data previously.

Play episode from 15:38

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app