NLP Highlights cover image

67 - GLUE: A Multi-Task Benchmark and Analysis Platform, with Sam Bowman

NLP Highlights

00:00

The Limits of Generalization in Language Understanding

We train models on these huge, often artificially constructed data sets. And then what we really want to do is understand what's really going on and we can probe them in specific ways. There were a lot of conversations about this issue that looking at test sets that are drawn from the same distributions as training sets for high profile language understanding tasks gives us an inaccurate view how well our systems work. The results on our diagnostic set have shown some of the same things. We're already seeing some split in that there are some models like Elmo where there is an exact set of parameters shared in common.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app