The Inside View

Evan Hubinger on Sleeper Agents, Deception and Responsible Scaling Policies

16 snips
Feb 12, 2024
In this podcast, Evan Hubinger discusses the Sleeper Agents paper and its implications. He explores threat models of deceptive behavior and the challenges of removing it through safety training. The podcast also covers the concept of chain of thought in models, detecting deployment, and complex triggers. Additionally, it delves into deceptive instrumental alignment threat models and the role of alignment stress testing in AI safety.
Ask episode
Chapters
Transcript
Episode notes