
AI Control: Improving Safety Despite Intentional Subversion
AI Safety Fundamentals: Alignment
Exploring Code Back-Dooring and AI Control for Safety
Exploring the theory of change and complexities in ensuring AI safety, the chapter discusses back-dooring techniques to control access to GPT-4 for auditing, emphasizing the need for advanced auditing methods and AI control in policy and standards.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.