#76 – Joe Carlsmith on Scheming AI

13 snips

Mar 16, 2024

Joe Carlsmith discusses the risks of AI systems being deceptive and misaligned during training, exploring the concept of scheming AI. The podcast covers the distinction between different types of AI models in training, the dangers of scheming behaviors, and the complexities of AI goals and motivations. It also delves into the challenges of detecting scheming AI early on, the importance of managing long-term AI motivations, and the uncertainties surrounding training AI models.

Ask episode

Chapters

Transcript

Episode notes

Unveiling Scheming AI: Understanding Deception and Training Gaming

02:36 • 18min

Distinguishing Types of Models in Training

Exploring the Relationship Between Human Evolution and Reward Optimization

The Dangers of Scheming AI Models

27:52 • 25min

Exploring Beyond-Episode Goals in AI Training

53:01 • 20min

Exploring Simplicity and Scheming in AI Models

01:13:04 • 26min

Understanding Slack and Scheming in AI Models

01:39:16 • 7min

Navigating the Challenges of Influence on Advanced AI

01:45:51 • 2min

Exploring the Uncertainties of Scheming AI

01:47:22 • 4min