The Inside View cover image

Collin Burns On Discovering Latent Knowledge In Language Models Without Supervision

The Inside View

00:00

Using Logical Consistency to Predict Next Tokens

truth has particularly special structure like I said before it's sort of logically consistent. This is unusual if you just take a random feature like like sentiment or like is this token now or where something like this that might be represented in language models won't satisfy logical consistency. Truth-like features can actually find truth-like features in the model by biological consistency and so on.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app