"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

AI Control: Using Untrusted Systems Safely with Buck Shlegeris of Redwood Research, from the 80,000 Hours Podcast

161 snips
May 4, 2025
In a compelling conversation, Buck Shlegeris, CEO of Redwood Research and a leader in AI safety, discusses innovative strategies for controlling untrusted AI systems. He delves into techniques like always-on auditing and honeypotting, shedding light on how to monitor AI behaviors effectively. Buck emphasizes the risks of AI misalignment and the importance of proactive measures to ensure safe outcomes. He also explores the ethical implications of AI evolution, calling for collaborative approaches to enhance safety in this rapidly developing field.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

AI Control vs Alignment

  • AI control assumes AIs are misaligned but seeks safe deployment.
  • This approach complements alignment research by focusing on containment, not perfect trust.
INSIGHT

AI vs Human Spy Dynamics

  • AIs are either all scheming or none; unlike human spies who operate independently.
  • We uniquely control AIs' information flow, enabling new monitoring techniques.
INSIGHT

Auditing Rare AI Threats

  • Acute AI risks are rare but highly incriminating, detectable with proper investigation.
  • Effective auditing requires escalating suspicious cases through a hierarchy for human review.
Get the Snipd Podcast app to discover more snips from this episode
Get the app