The Distribution Mismatch Between Train and Test Sets

I would certainly love to see more reporting of these going forward but at the same time like I think there's also a lot you can gain from in domain error analysis as well right so I think these are all just different tools to help us understand how these models behave. We intentionally never put it on the leaderboard for instance because we felt that it would just encourage people to find ways to kind of game that particular robustness metric and not actually be more robust in general. Having humans in the loop at some level is going to be critical but I can definitely imagine like kind of semi-automated ways of doing stuff too if you want to talk about before I wrap up.

Play episode from 44:09

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app