AI Safety Fundamentals: Alignment cover image

AI Control: Improving Safety Despite Intentional Subversion

AI Safety Fundamentals: Alignment

00:00

Exploring Code Back-Dooring and AI Control for Safety

Exploring the theory of change and complexities in ensuring AI safety, the chapter discusses back-dooring techniques to control access to GPT-4 for auditing, emphasizing the need for advanced auditing methods and AI control in policy and standards.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app