AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Emergent Deception and Emergent Optimization
This chapter explores the concept of emergent behavior in language models and its relation to planning capabilities. It discusses the conditioning of models with training data that contains descriptions of plans and the potential risks that arise from this, emphasizing the need to address these risks for future model safety.