"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis cover image

AI Control: Using Untrusted Systems Safely with Buck Shlegeris of Redwood Research, from the 80,000 Hours Podcast

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

00:00

Navigating Untrusted AI Monitoring

This chapter explores testing strategies for managing untrusted AI models in simulated environments, focusing on the roles of red and blue teams in identifying potential failures. The discussion highlights innovative control mechanisms and the complexities of monitoring AI systems effectively, including issues surrounding realistic testing and the detection of malicious actions. Ultimately, it emphasizes the critical importance of evaluating threats and understanding how AI models can coordinate covertly, presenting significant challenges for oversight.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app