
67 - GLUE: A Multi-Task Benchmark and Analysis Platform, with Sam Bowman
NLP Highlights
00:00
The Limits of Generalization in Language Understanding
We train models on these huge, often artificially constructed data sets. And then what we really want to do is understand what's really going on and we can probe them in specific ways. There were a lot of conversations about this issue that looking at test sets that are drawn from the same distributions as training sets for high profile language understanding tasks gives us an inaccurate view how well our systems work. The results on our diagnostic set have shown some of the same things. We're already seeing some split in that there are some models like Elmo where there is an exact set of parameters shared in common.
Transcript
Play full episode