AXRP - the AI X-risk Research Podcast

47 - David Rein on METR Time Horizons

21 snips
Jan 2, 2026
David Rein, a researcher at METR specializing in AI capability evaluation, dives deep into measuring AI's ability to handle long tasks. He explains what time horizons mean for models like Claude Opus 4.5 and why they matter for assessing AI progress. The discussion includes the significance of task length, examples of varying difficulties, and the implications of AI's rapid advancements in capabilities. Rein also explores the challenges of measuring effectiveness and future risks associated with AI progression and its potential to outpace human developers.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
INSIGHT

Time Horizon Measures Model Agency

  • METR defines time horizon as how long tasks take humans that models are ~50% likely to succeed at.
  • Over ~5 years METR observed an exponential increase in task lengths models can complete.
ANECDOTE

A Simple File-Name Task

  • An example easy task: pick which file likely contains a password from filenames like credentials.txt.
  • GPT-2 already succeeds on these short multiple-choice style tasks using token likelihoods.
ANECDOTE

CSV Parsing As An Intermediate Task

  • An intermediate task: write a 20–30 line script to parse a CSV of ~50–100 rows.
  • An experienced data scientist might take minutes, juniors 15–30 minutes to complete it.
Get the Snipd Podcast app to discover more snips from this episode
Get the app