The Nonlinear Library

AF - AI Control: Improving Safety Despite Intentional Subversion by Buck Shlegeris

Dec 13, 2023
Buck Shlegeris discusses AI Control and safety methods for preventing catastrophic failures caused by colluding AI instances. They explore securing code submissions with advanced AI models, address high-stakes AI safety challenges, and emphasize the importance of scalable oversight techniques. The podcast delves into preventing intentional subversion in AI systems and outlines future directions for enhancing AI control and safety protocols.
Ask episode
Chapters
Transcript
Episode notes