Using Model Organisms to Understand Alignment Failures

This chapter focuses on the importance of studying model organisms to gain insights into the fundamental causes of alignment failures in AI models. It also discusses the building of a phase diagram and testing alignment techniques to catch or mitigate deception.

Play episode from 08:19

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app