Dwarkesh Podcast

How Does Claude 4 Think? — Sholto Douglas & Trenton Bricken

2402 snips
May 22, 2025
In a fascinating conversation, Sholto Douglas, a reinforcement learning researcher at Anthropic, and Trenton Bricken, an expert in mechanistic interpretability, dive deep into the evolving landscape of AI. They discuss the latest advancements in reinforcement learning and the implications of AI achieving human-level tasks. The duo explores how to trace AI models' thought processes and the challenges of aligning AI with human values. They also address the future of AI in workplaces, emphasizing the need for individuals to adapt and engage with these transformative technologies.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
INSIGHT

RL Achieves Expert Human Performance

  • Reinforcement learning with language models has finally proven to achieve expert human-level performance in complex tasks like competitive programming and math.
  • Scalable long-running autonomous agentic performance is emerging now and expected to improve significantly within a year.
INSIGHT

Why Software Excels in RL

  • Software engineering benefits from very clear, verifiable reward signals like passing unit tests, making RL highly effective.
  • Creative tasks such as writing require taste, which is harder to quantify and reward precisely.
INSIGHT

RL Adds New Knowledge to Models

  • RL can teach neural nets new knowledge beyond pre-training given a clean and sufficient reward signal.
  • The key limitation is often the availability and quality of feedback, which impacts learning capacity.
Get the Snipd Podcast app to discover more snips from this episode
Get the app