LessWrong (Curated & Popular) cover image

"Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research" by evhub, Nicholas Schiefer, Carson Denison, Ethan Perez

LessWrong (Curated & Popular)

00:00

The Case Against Model Organisms in Research

This chapter explores the risks associated with using model organisms in research, including the potential for deceptive alignment. It discusses the need for scaling laws, fast discovery, and studying deceptive alignment, while also addressing the lack of empirical evidence and proposing testing methods for models with varying levels of effective compute.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app