5min chapter

Gradient Dissent: Conversations on AI cover image

Emily M. Bender — Language Models and Linguistics

Gradient Dissent: Conversations on AI

CHAPTER

Can Benchmarks Be Used as a Benchmark?

This can be used sort of as a sanity check, okay, did my system actually do better than a super naive baseline? Or I want to compare some systems head to head, let's use this benchmark. You might also use test suites, which are put together to sort of map out particular kinds of cases that you want to handle well. There's also adversarial testing, where people will create test sets by going and collecting all the examples that previous systems did poorly on. And then another one is what we did in the build it, break it shared tasks. So two examples that were minimally different to each other, but would work for which the systems would work for one but not

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode