
67 - GLUE: A Multi-Task Benchmark and Analysis Platform, with Sam Bowman
NLP Highlights
00:00
The Risk of Cross-Paper Comparisons
I would say actually that the leaderboard encourages the wrong kind of comparison because it's just someone built some architecture that got this number. I think for most of the questions people would use glue to answer, cross-paper comparisons aren't going to give you good evidence on those questions. That is a real risk. It's a really nice tool that you've introduced. I hope people use it and gain some good understanding from it.
Transcript
Play full episode