
AXRP - the AI X-risk Research Podcast 47 - David Rein on METR Time Horizons
21 snips
Jan 2, 2026 David Rein, a researcher at METR specializing in AI capability evaluation, dives deep into measuring AI's ability to handle long tasks. He explains what time horizons mean for models like Claude Opus 4.5 and why they matter for assessing AI progress. The discussion includes the significance of task length, examples of varying difficulties, and the implications of AI's rapid advancements in capabilities. Rein also explores the challenges of measuring effectiveness and future risks associated with AI progression and its potential to outpace human developers.
AI Snips
Chapters
Books
Transcript
Episode notes
Time Horizon Measures Model Agency
- METR defines time horizon as how long tasks take humans that models are ~50% likely to succeed at.
- Over ~5 years METR observed an exponential increase in task lengths models can complete.
A Simple File-Name Task
- An example easy task: pick which file likely contains a password from filenames like credentials.txt.
- GPT-2 already succeeds on these short multiple-choice style tasks using token likelihoods.
CSV Parsing As An Intermediate Task
- An intermediate task: write a 20–30 line script to parse a CSV of ~50–100 rows.
- An experienced data scientist might take minutes, juniors 15–30 minutes to complete it.


