NLP Highlights cover image

60 - FEVER: a large-scale dataset for Fact Extraction and VERification, with James Thorne

NLP Highlights

00:00

The Artifacts Problem in a Data Set

In our data set we report two types of scores, there's the label only accuracy. And there's the conditional accuracy on finding the right evidence. This hypothesis only style evaluation gives us a score of about 50%, which is comparable to the multi and allied datasets. So yeah significantly above chance, but at least it's not way higher than that. It also shows that the artifacts problem is equally as problematic to us as the style of data sets if we ignore the requirement for evidence.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app