The Inside View cover image

Collin Burns On Discovering Latent Knowledge In Language Models Without Supervision

The Inside View

CHAPTER

The Misaligned AI System

I think one of the main interficience people often have an alignment for why alignment is hard is truly if you have this misaligned AI system it's like it seems really impossible to distinguish that from the truth. In a line AI system because the mis aligned AI system could be actively lying and superhuman so you can't tell when it's lying and so on. So I think intuitively this should feel easier because you have access both to the misaligned system and also the truth or something like this. It's not as worst case as some some types of misaligned AI systems you can run into with an alignment but it's actually less adversarial than many settings like this.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner