AI Safety Fundamentals cover image

Emergent Deception and Emergent Optimization

AI Safety Fundamentals

00:00

Emergent Deception and Emergent Optimization

This chapter explores the concept of emergent behavior in language models and its relation to planning capabilities. It discusses the conditioning of models with training data that contains descriptions of plans and the potential risks that arise from this, emphasizing the need to address these risks for future model safety.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app