
"Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research" by evhub, Nicholas Schiefer, Carson Denison, Ethan Perez
LessWrong (Curated & Popular)
00:00
Using Model Organisms to Understand Alignment Failures
This chapter focuses on the importance of studying model organisms to gain insights into the fundamental causes of alignment failures in AI models. It also discusses the building of a phase diagram and testing alignment techniques to catch or mitigate deception.
Transcript
Play full episode