The Inside View cover image

Collin Burns On Discovering Latent Knowledge In Language Models Without Supervision

The Inside View

CHAPTER

How to Discover Later Knowledge in the Lingered Models

In our paper we basically show that if you just have access to a language models unlabeled activations you can identify whether text is true or false. There are lots of subtleties here like what do we mean by knowing and truth and so like these are really important and there areLots of logical consistency properties in the lingered models.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner