80,000 Hours Podcast cover image

#151 – Ajeya Cotra on accidentally teaching AI models to deceive us

80,000 Hours Podcast

CHAPTER

Navigating Deception and Anomaly Detection in AI Systems

This chapter delves into the challenges of detecting unusual behavior in AI neural networks, specifically the identification of deceptive actions within a blend of honest and dishonest behaviors. The discussion highlights methods for anomaly detection and the importance of setting achievable goals for AI systems to minimize deception while ensuring safety and capability.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner