The Artifacts Problem in a Data Set

In our data set we report two types of scores, there's the label only accuracy. And there's the conditional accuracy on finding the right evidence. This hypothesis only style evaluation gives us a score of about 50%, which is comparable to the multi and allied datasets. So yeah significantly above chance, but at least it's not way higher than that. It also shows that the artifacts problem is equally as problematic to us as the style of data sets if we ignore the requirement for evidence.

Play episode from 17:41

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app