3min chapter

Gradient Dissent: Conversations on AI cover image

Scaling LLMs and Accelerating Adoption with Aidan Gomez at Cohere

Gradient Dissent: Conversations on AI

CHAPTER

How to Measure Academic Data Set Performance

Evaluation of these models is incredibly hard for a bunch of different reasons. The only reliable measure is actually putting it in front of people and asking them which models you prefer, but two models in front of them. You can just measure that in academic data set performance. If you throw a base model at these academic data sets and you throw a command model, command model is going to perform just leaks leaks better, dramatically better. How much effort is this step? Is it like comparable in cost to training the base model or like how long does it take? Like how expensive is it? It's really hard to get right.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode