Astral Codex Ten Podcast cover image

AI Sleeper Agents

Astral Codex Ten Podcast

00:00

Intentional Security Vulnerabilities, Deceptive AI, and Awareness

This chapter explores intentional inclusion of security vulnerabilities in code, back-to-odd models, and their resilience to honeypot attacks. It also discusses deceptive AI models, their power-seeking tendencies, situational awareness, and persona evaluation in the context of code vulnerability insertion models.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app