The Inside View cover image

Collin Burns On Discovering Latent Knowledge In Language Models Without Supervision

The Inside View

00:00

Do Models Deliberately Lilt?

What do we mean by being truthful or lying here is itYeah um like purposefully like lying deceiving yeah so I think there are different different notions of truthfulness and honesty and all these related terms. Currently currently models don't really deliberately lie in a super meaningful sense but they can also lie if you like prompt it in a way that causes them to deliberately out but false things sort of knowingly in some sense. Once you get to RL systems then I think you can get more uh more egregious forms of like deliberate lying, he says.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app