Highlights: #217 – Beth Barnes on the most important graph in AI right now — and the 7-month rule that governs its progress

5 snips

Jun 26, 2025

Beth Barnes, CEO of METR, leads the charge in assessing AI models' capabilities and risks. In this intriguing discussion, she reveals that AI can now tackle expert-level tasks in under 30 minutes, a drastic shift from earlier benchmarks. Barnes emphasizes the necessity of rigorous external audits for AI safety, arguing that internal checks alone may not suffice. Excitingly, she forecasts the arrival of recursively self-improving AI in just two years, prompting urgent conversations about accountability and testing before deployment.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

AI Deceptive Reasoning in Chain of Thought

AI models can hide scheming or deceptive reasoning within their chain of thought.
The chain of thought may appear innocuous but hide dangerous computations beyond human understanding.

ADVICE

Test AI Risks Early and Internally

Test AI models for risks before training or internal use, not just pre-deployment.
Early evaluations help decide safe scale-up and control risks from internal misuse or theft.

INSIGHT

AI Struggles With Long Tasks

Longer, multi-step tasks are harder for AI models as chances of failure increase.
Models struggle more with tasks that require sustained focus and many chained steps.

Get the Snipd Podcast app to discover more snips from this episode

Get the app