
AI Control: Improving Safety Despite Intentional Subversion
AI Safety Fundamentals: Alignment
Introduction
Exploring techniques to safeguard AI systems from intentional subversion, this chapter delves into a paper on AI Control that aims to prevent catastrophic outcomes and discusses strategies to mitigate risks from colluding AIs.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.