AI Safety Fundamentals: Alignment cover image

AI Control: Improving Safety Despite Intentional Subversion

AI Safety Fundamentals: Alignment

CHAPTER

Exploring Code Back-Dooring and AI Control for Safety

Exploring the theory of change and complexities in ensuring AI safety, the chapter discusses back-dooring techniques to control access to GPT-4 for auditing, emphasizing the need for advanced auditing methods and AI control in policy and standards.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner