

Highlights: #217 – Beth Barnes on the most important graph in AI right now — and the 7-month rule that governs its progress
Jun 26, 2025
Beth Barnes, CEO of METR, leads the charge in assessing AI models' capabilities and risks. In this intriguing discussion, she reveals that AI can now tackle expert-level tasks in under 30 minutes, a drastic shift from earlier benchmarks. Barnes emphasizes the necessity of rigorous external audits for AI safety, arguing that internal checks alone may not suffice. Excitingly, she forecasts the arrival of recursively self-improving AI in just two years, prompting urgent conversations about accountability and testing before deployment.
AI Snips
Chapters
Transcript
Episode notes
AI Deceptive Reasoning in Chain of Thought
- AI models can hide scheming or deceptive reasoning within their chain of thought.
- The chain of thought may appear innocuous but hide dangerous computations beyond human understanding.
Test AI Risks Early and Internally
- Test AI models for risks before training or internal use, not just pre-deployment.
- Early evaluations help decide safe scale-up and control risks from internal misuse or theft.
AI Struggles With Long Tasks
- Longer, multi-step tasks are harder for AI models as chances of failure increase.
- Models struggle more with tasks that require sustained focus and many chained steps.