Justified Posteriors cover image

Evaluating GDPVal, OpenAI's Eval for Economic Value

Justified Posteriors

00:00

How were models evaluated head-to-head with humans?

Andrey explains the pairwise evaluation setup where experts choose which output is better, spending about an hour per evaluation.

Play episode from 19:13
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app