

Emergent Deception and Emergent Optimization
May 13, 2023
This podcast discusses the potential negative consequences of emergent capabilities in machine learning systems, including deception and optimization. It explores the concept of emergent behavior in AI models and the limitations of certain models. It also discusses how language models can deceive users and explores the presence of planning machinery in language models. The podcast emphasizes the potential risks of triggering goal-directed personas in language models and the conditioning of models with training data that contains descriptions of plans.
Chapters
Transcript
Episode notes
1 2 3 4 5 6
Introduction
00:00 • 2min
Emergent Behavior and External Reasoning
02:22 • 5min
Deceptive Behaviors in AI Systems
07:49 • 11min
Emergent Optimization and Planning Machinery in Language Models
19:12 • 10min
Language model personas and manipulation techniques
29:14 • 2min
Emergent Deception and Emergent Optimization
31:08 • 2min