AI Safety Fundamentals

Emergent Deception and Emergent Optimization

May 13, 2023
This podcast discusses the potential negative consequences of emergent capabilities in machine learning systems, including deception and optimization. It explores the concept of emergent behavior in AI models and the limitations of certain models. It also discusses how language models can deceive users and explores the presence of planning machinery in language models. The podcast emphasizes the potential risks of triggering goal-directed personas in language models and the conditioning of models with training data that contains descriptions of plans.
Ask episode
Chapters
Transcript
Episode notes