LessWrong (Curated & Popular) cover image

"Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research" by evhub, Nicholas Schiefer, Carson Denison, Ethan Perez

LessWrong (Curated & Popular)

00:00

Using Model Organisms to Understand Alignment Failures

This chapter focuses on the importance of studying model organisms to gain insights into the fundamental causes of alignment failures in AI models. It also discusses the building of a phase diagram and testing alignment techniques to catch or mitigate deception.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app