Emergent Deception and Emergent Optimization

May 13, 2023

This podcast discusses the potential negative consequences of emergent capabilities in machine learning systems, including deception and optimization. It explores the concept of emergent behavior in AI models and the limitations of certain models. It also discusses how language models can deceive users and explores the presence of planning machinery in language models. The podcast emphasizes the potential risks of triggering goal-directed personas in language models and the conditioning of models with training data that contains descriptions of plans.

Ask episode

Chapters

Transcript

Episode notes

Introduction

00:00 • 2min

Emergent Behavior and External Reasoning

02:22 • 5min

Deceptive Behaviors in AI Systems

07:49 • 11min

Emergent Optimization and Planning Machinery in Language Models

19:12 • 10min

Language model personas and manipulation techniques

29:14 • 2min

Emergent Deception and Emergent Optimization

31:08 • 2min