
On Emergent Misalignment
Don't Worry About the Vase Podcast
00:00
Exploring Correlations and Misalignments in Large Language Models
This chapter explores the intricacies of low decoupling in machine learning, particularly within large language models. It highlights how contextual cues influence model responses, potentially leading to superficial and profound misalignments based on training data vulnerabilities.
Transcript
Play full episode