Emergent Deception and Emergent Optimization

This chapter explores the concept of emergent behavior in language models and its relation to planning capabilities. It discusses the conditioning of models with training data that contains descriptions of plans and the potential risks that arise from this, emphasizing the need to address these risks for future model safety.

Play episode from 31:08

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app