Gradient Dissent: Conversations on AI cover image

Scaling LLMs and Accelerating Adoption with Aidan Gomez at Cohere

Gradient Dissent: Conversations on AI

00:00

How to Measure Academic Data Set Performance

Evaluation of these models is incredibly hard for a bunch of different reasons. The only reliable measure is actually putting it in front of people and asking them which models you prefer, but two models in front of them. You can just measure that in academic data set performance. If you throw a base model at these academic data sets and you throw a command model, command model is going to perform just leaks leaks better, dramatically better. How much effort is this step? Is it like comparable in cost to training the base model or like how long does it take? Like how expensive is it? It's really hard to get right.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner