The Diagnostic Data Set for GLUE: A Set of 1,000 Textual Entailment Examples

The diagnostic data set we have now is a set of about 1,000 textual entailment examples. We're looking at predicate argument structure and how well the models capture that. The performance on that data doesn't count toward your primary score. There's just something you can click on a system on the leaderboard,. You can see how it does on these categories likepredicate argument structure. And here you just evaluate your M&LI classifier, your multi-analyze classifier, on this data.

Play episode from 10:29

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app