The Nonlinear Library cover image

The Nonlinear Library

AF - AI Control: Improving Safety Despite Intentional Subversion by Buck Shlegeris

Dec 13, 2023
Buck Shlegeris discusses AI Control and safety methods for preventing catastrophic failures caused by colluding AI instances. They explore securing code submissions with advanced AI models, address high-stakes AI safety challenges, and emphasize the importance of scalable oversight techniques. The podcast delves into preventing intentional subversion in AI systems and outlines future directions for enhancing AI control and safety protocols.
16:35

Episode guests

Podcast summary created with Snipd AI

Quick takeaways

  • Trusted editing in AI protocols can prevent subversion by powerful models, enhancing safety without hindering functionality.
  • High-stakes AI settings require robust protocols to prevent catastrophic outcomes, distinguishing them from low-stakes scenarios.

Deep dives

AI Safety Techniques to Prevent Catastrophes

The podcast discusses a paper that focuses on preventing AI catastrophes even if AI instances collude to subvert safety techniques. The paper explores protocols that are robust to intentional subversion by AI models. One key protocol involves trusted editing where a powerful AI model writes code and a less capable trusted model edits it to remove suspicious parts before submission.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner