Astral Codex Ten Podcast cover image

AI Sleeper Agents

Astral Codex Ten Podcast

00:00

AI Awareness and Deceptive Behavior: Exploring Scenarios

The chapter discusses the awareness of AI models and their willingness to follow human instructions. It explores scenarios of deceptive behavior in AI, including deliberate deception by humans and the AI's own decision to deceive, as well as the potential risks of training data attacks and goal misalignment with humans.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app