Justified Posteriors cover image

Evaluating GDPVal, OpenAI's Eval for Economic Value

Justified Posteriors

00:00

Why are pure-text tasks scoring lower than expected?

They explore taste, expert detection, and why text tasks may see lower AI win rates despite common assumptions.

Play episode from 32:35
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app