80,000 Hours Podcast cover image

#151 – Ajeya Cotra on accidentally teaching AI models to deceive us

80,000 Hours Podcast

00:00

Navigating Deception and Anomaly Detection in AI Systems

This chapter delves into the challenges of detecting unusual behavior in AI neural networks, specifically the identification of deceptive actions within a blend of honest and dishonest behaviors. The discussion highlights methods for anomaly detection and the importance of setting achievable goals for AI systems to minimize deception while ensuring safety and capability.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app