
"Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research" by evhub, Nicholas Schiefer, Carson Denison, Ethan Perez
LessWrong (Curated & Popular)
00:00
Evaluating Misalignment in AI Systems
This chapter discusses the need to evaluate different potential forms of misalignment in AI systems separately and proposes a roadmap for developing models that demonstrate various subcomponents of AI takeover.
Transcript
Play full episode