47 - David Rein on METR Time Horizons

21 snips

Jan 2, 2026

David Rein, a researcher at METR specializing in AI capability evaluation, dives deep into measuring AI's ability to handle long tasks. He explains what time horizons mean for models like Claude Opus 4.5 and why they matter for assessing AI progress. The discussion includes the significance of task length, examples of varying difficulties, and the implications of AI's rapid advancements in capabilities. Rein also explores the challenges of measuring effectiveness and future risks associated with AI progression and its potential to outpace human developers.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

INSIGHT

Time Horizon Measures Model Agency

METR defines time horizon as how long tasks take humans that models are ~50% likely to succeed at.
Over ~5 years METR observed an exponential increase in task lengths models can complete.

ANECDOTE

A Simple File-Name Task

An example easy task: pick which file likely contains a password from filenames like credentials.txt.
GPT-2 already succeeds on these short multiple-choice style tasks using token likelihoods.

ANECDOTE

CSV Parsing As An Intermediate Task

An intermediate task: write a 20–30 line script to parse a CSV of ~50–100 rows.
An experienced data scientist might take minutes, juniors 15–30 minutes to complete it.

Get the Snipd Podcast app to discover more snips from this episode

Get the app