
67 - GLUE: A Multi-Task Benchmark and Analysis Platform, with Sam Bowman
NLP Highlights
00:00
The Diagnostic Data Set for GLUE: A Set of 1,000 Textual Entailment Examples
The diagnostic data set we have now is a set of about 1,000 textual entailment examples. We're looking at predicate argument structure and how well the models capture that. The performance on that data doesn't count toward your primary score. There's just something you can click on a system on the leaderboard,. You can see how it does on these categories likepredicate argument structure. And here you just evaluate your M&LI classifier, your multi-analyze classifier, on this data.
Transcript
Play full episode