

27 - AI Control with Buck Shlegeris and Ryan Greenblatt
106 snips Apr 11, 2024
Buck Shlegeris and Ryan Greenblatt discuss AI control mechanisms in preventing AI from taking over the world. They cover topics such as protocols for AI control, preventing dangerous coded AI communication, unpredictably uncontrollable AI, and the impact of AI control on the AI safety field.
AI Snips
Chapters
Transcript
Episode notes
AI Control Focus and Simplicity
- AI control targets risks from models intentionally subverting safety measures rather than assuming models won't try to cause problems.
- It is easier to ensure safety by making sure AIs cannot cause harm even if they try, rather than ensuring they don't want to cause harm.
Trusted Model Monitors Smarter AI
- Use a trusted, less capable model to monitor actions of smarter AIs for sketchy behavior before executing their commands.
- Report suspicious actions to humans for review and escalate if deemed problematic to prevent unsafe behavior.
Human-Level AI Are Likely Controllable
- AI systems roughly human-level in capabilities and behavior are likely controllable with current techniques.
- Humans themselves cause problems, but can still be controlled via monitoring and constraints similar to AI control.