The Inside View cover image

Collin Burns On Discovering Latent Knowledge In Language Models Without Supervision

The Inside View

00:00

The Misaligned AI System

I think one of the main interficience people often have an alignment for why alignment is hard is truly if you have this misaligned AI system it's like it seems really impossible to distinguish that from the truth. In a line AI system because the mis aligned AI system could be actively lying and superhuman so you can't tell when it's lying and so on. So I think intuitively this should feel easier because you have access both to the misaligned system and also the truth or something like this. It's not as worst case as some some types of misaligned AI systems you can run into with an alignment but it's actually less adversarial than many settings like this.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app