NLP Highlights cover image

67 - GLUE: A Multi-Task Benchmark and Analysis Platform, with Sam Bowman

NLP Highlights

00:00

The Risk of Cross-Paper Comparisons

I would say actually that the leaderboard encourages the wrong kind of comparison because it's just someone built some architecture that got this number. I think for most of the questions people would use glue to answer, cross-paper comparisons aren't going to give you good evidence on those questions. That is a real risk. It's a really nice tool that you've introduced. I hope people use it and gain some good understanding from it.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app