AXRP - the AI X-risk Research Podcast

27 - AI Control with Buck Shlegeris and Ryan Greenblatt

106 snips
Apr 11, 2024
Buck Shlegeris and Ryan Greenblatt discuss AI control mechanisms in preventing AI from taking over the world. They cover topics such as protocols for AI control, preventing dangerous coded AI communication, unpredictably uncontrollable AI, and the impact of AI control on the AI safety field.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

AI Control Focus and Simplicity

  • AI control targets risks from models intentionally subverting safety measures rather than assuming models won't try to cause problems.
  • It is easier to ensure safety by making sure AIs cannot cause harm even if they try, rather than ensuring they don't want to cause harm.
ADVICE

Trusted Model Monitors Smarter AI

  • Use a trusted, less capable model to monitor actions of smarter AIs for sketchy behavior before executing their commands.
  • Report suspicious actions to humans for review and escalate if deemed problematic to prevent unsafe behavior.
INSIGHT

Human-Level AI Are Likely Controllable

  • AI systems roughly human-level in capabilities and behavior are likely controllable with current techniques.
  • Humans themselves cause problems, but can still be controlled via monitoring and constraints similar to AI control.
Get the Snipd Podcast app to discover more snips from this episode
Get the app