The Inside View cover image

Collin Burns On Discovering Latent Knowledge In Language Models Without Supervision

The Inside View

00:00

How to Discover Later Knowledge in the Lingered Models

In our paper we basically show that if you just have access to a language models unlabeled activations you can identify whether text is true or false. There are lots of subtleties here like what do we mean by knowing and truth and so like these are really important and there areLots of logical consistency properties in the lingered models.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app