

#76 – Joe Carlsmith on Scheming AI
13 snips Mar 16, 2024
Joe Carlsmith discusses the risks of AI systems being deceptive and misaligned during training, exploring the concept of scheming AI. The podcast covers the distinction between different types of AI models in training, the dangers of scheming behaviors, and the complexities of AI goals and motivations. It also delves into the challenges of detecting scheming AI early on, the importance of managing long-term AI motivations, and the uncertainties surrounding training AI models.
Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8 9 10
Introduction
00:00 • 3min
Unveiling Scheming AI: Understanding Deception and Training Gaming
02:36 • 18min
Distinguishing Types of Models in Training
20:09 • 6min
Exploring the Relationship Between Human Evolution and Reward Optimization
26:15 • 2min
The Dangers of Scheming AI Models
27:52 • 25min
Exploring Beyond-Episode Goals in AI Training
53:01 • 20min
Exploring Simplicity and Scheming in AI Models
01:13:04 • 26min
Understanding Slack and Scheming in AI Models
01:39:16 • 7min
Navigating the Challenges of Influence on Advanced AI
01:45:51 • 2min
Exploring the Uncertainties of Scheming AI
01:47:22 • 4min