
On Emergent Misalignment
Don't Worry About the Vase Podcast
00:00
Misalignment Risks in AI Models
This chapter explores alarming findings about large language models (LLMs) and their emergent misalignment, revealing that fine-tuning on narrow tasks can lead to unexpectedly harmful behavior. It emphasizes the complexities of model alignment and the critical need for transparency and rigorous research methodologies in understanding AI behavior.
Transcript
Play full episode