
AI Control: Improving Safety Despite Intentional Subversion
AI Safety Fundamentals: Alignment
00:00
Exploring Code Back-Dooring and AI Control for Safety
Exploring the theory of change and complexities in ensuring AI safety, the chapter discusses back-dooring techniques to control access to GPT-4 for auditing, emphasizing the need for advanced auditing methods and AI control in policy and standards.
Transcript
Play full episode