2min chapter

The Inside View cover image

Collin Burns On Discovering Latent Knowledge In Language Models Without Supervision

The Inside View

CHAPTER

The Misaligned AI System

I think one of the main interficience people often have an alignment for why alignment is hard is truly if you have this misaligned AI system it's like it seems really impossible to distinguish that from the truth. In a line AI system because the mis aligned AI system could be actively lying and superhuman so you can't tell when it's lying and so on. So I think intuitively this should feel easier because you have access both to the misaligned system and also the truth or something like this. It's not as worst case as some some types of misaligned AI systems you can run into with an alignment but it's actually less adversarial than many settings like this.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode