4min chapter

The Inside View cover image

Collin Burns On Discovering Latent Knowledge In Language Models Without Supervision

The Inside View

CHAPTER

How Do We Distinguish Between the Truth and the Misaligned System?

"There are a couple things that change once you scale up models," he says. "The first worry is okay maybe the model doesn't represent is this input actually true or false to begin with Maybe it just thinks about what a human would say this is true or false and so it doesn't actually represent its beliefs in a simple way internally" He also talks about how we might be able to distinguish between truth-like features from those of misaligned systems. 'I think humans won't know answers to superhuman questions mostly i think they'll be like 50-50'

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode