Justified Posteriors

Evaluating GDPVal, OpenAI's Eval for Economic Value

15 snips
Nov 4, 2025
Dive into the intriguing world of AI evaluations with a focus on OpenAI's new GDPVal metric. This innovative approach contrasts sharply with traditional macro frameworks, assessing AI's economic impact on specific tasks. Surprising findings reveal AI models like Claude achieving near human parity in various tasks. The discussion also uncovers the complexities of task design and the role of prompt engineering in AI performance. Expect insights on potential economic value automation could bring, alongside the challenges of automating knowledge work.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Task-Level GDP Valuation

  • OpenAI measures AI economic value by evaluating hundreds of real tasks and mapping them to occupational wages.
  • This bottom-up task-level approach aims to estimate practical GDP impact rather than aggregate averages.
INSIGHT

Low Prior For AI Win Rates

  • Seth and Andrey expected generic AIs to win only about 10% head-to-head versus paid experts.
  • That prior framed their surprise at OpenAI's reported win rates.
ANECDOTE

Complex CAD To PDF Task Example

  • Andrey describes a complex manufacturing engineer task requiring CAD and a PDF deliverable as an example.
  • Seth admits neither could complete it quickly without weeks or AI help.
Get the Snipd Podcast app to discover more snips from this episode
Get the app