AXRP - the AI X-risk Research Podcast

27 - AI Control with Buck Shlegeris and Ryan Greenblatt

106 snips

Apr 11, 2024

Buck Shlegeris and Ryan Greenblatt discuss AI control mechanisms in preventing AI from taking over the world. They cover topics such as protocols for AI control, preventing dangerous coded AI communication, unpredictably uncontrollable AI, and the impact of AI control on the AI safety field.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

AI Control Focus and Simplicity

AI control targets risks from models intentionally subverting safety measures rather than assuming models won't try to cause problems.
It is easier to ensure safety by making sure AIs cannot cause harm even if they try, rather than ensuring they don't want to cause harm.

ADVICE

Trusted Model Monitors Smarter AI

Use a trusted, less capable model to monitor actions of smarter AIs for sketchy behavior before executing their commands.
Report suspicious actions to humans for review and escalate if deemed problematic to prevent unsafe behavior.

INSIGHT

Human-Level AI Are Likely Controllable

AI systems roughly human-level in capabilities and behavior are likely controllable with current techniques.
Humans themselves cause problems, but can still be controlled via monitoring and constraints similar to AI control.

Get the Snipd Podcast app to discover more snips from this episode