Gradient Dissent: Conversations on AI cover image

Emily M. Bender — Language Models and Linguistics

Gradient Dissent: Conversations on AI

CHAPTER

Can Benchmarks Be Used as a Benchmark?

This can be used sort of as a sanity check, okay, did my system actually do better than a super naive baseline? Or I want to compare some systems head to head, let's use this benchmark. You might also use test suites, which are put together to sort of map out particular kinds of cases that you want to handle well. There's also adversarial testing, where people will create test sets by going and collecting all the examples that previous systems did poorly on. And then another one is what we did in the build it, break it shared tasks. So two examples that were minimally different to each other, but would work for which the systems would work for one but not

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner