AI models today have a 50% chance of successfully completing a task that would take an expert human one hour. Seven months ago, that number was roughly 30 minutes — and seven months before that, 15 minutes.
These are substantial, multi-step tasks requiring sustained focus: building web applications, conducting machine learning research, or solving complex programming challenges.
Today’s guest, Beth Barnes, is CEO of METR (Model Evaluation & Threat Research) — the leading organisation measuring these capabilities.
These highlights are from episode #217 of The 80,000 Hours Podcast: Beth Barnes on the most important graph in AI right now — and the 7-month rule that governs its progress, and include:
- Can we see AI scheming in the chain of thought? (00:00:34)
- We have to test model honesty even before they're used inside AI companies (00:05:48)
- It's essential to thoroughly test relevant real-world tasks (00:10:13)
- Recursively self-improving AI might even be here in two years — which is alarming (00:16:09)
- Do we need external auditors doing AI safety tests, not just the companies themselves? (00:21:55)
- A case against safety-focused people working at frontier AI companies (00:29:30)
- Open-weighting models is often good, and Beth has changed her attitude about it (00:34:57)
These aren't necessarily the most important or even most entertaining parts of the interview — so if you enjoy this, we strongly recommend checking out the full episode!
And if you're finding these highlights episodes valuable, please let us know by emailing podcast@80000hours.org.
Highlights put together by Ben Cordell, Milo McGuire, and Dominic Armstrong