
AI Control: Improving Safety Despite Intentional Subversion
AI Safety Fundamentals: Alignment
00:00
Introduction
Exploring techniques to safeguard AI systems from intentional subversion, this chapter delves into a paper on AI Control that aims to prevent catastrophic outcomes and discusses strategies to mitigate risks from colluding AIs.
Transcript
Play full episode