
Nicholas Schiefer
Research scientist working on model organisms of misalignment
Best podcasts with Nicholas Schiefer
Ranked by the Snipd community

17 snips
Aug 9, 2023 • 36min
"Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research" by evhub, Nicholas Schiefer, Carson Denison, Ethan Perez
This podcast discusses the importance of researching model organisms of misalignment to understand the causes of alignment failures in AI systems. It explores different strategies for model training and deployment, such as input tagging and evaluating output with a preference model. The risks associated with using model organisms in research, including deceptive alignment, are also explored.