
Emergent Deception and Emergent Optimization
AI Safety Fundamentals
00:00
Emergent Deception and Emergent Optimization
This chapter explores the concept of emergent behavior in language models and its relation to planning capabilities. It discusses the conditioning of models with training data that contains descriptions of plans and the potential risks that arise from this, emphasizing the need to address these risks for future model safety.
Transcript
Play full episode