The Inside View cover image

Ethan Perez–Inverse Scaling, Language Feedback, Red Teaming

The Inside View

00:00

Is There a Way to Measure Deception or Measure Lies?

Inverse calling will not be able to catch anything that inverse calling won't. Ye, in some way, you're still out of distribution. I think it's at least worth a testing if we're in the world where we are getting these shaped or invert scaling curves. Yes, i think this is something that i'm pretty interested to find out. Ah, and so that makes it very tricky to reason about, ah, the behaviours and the output probabilities and things like that that the model's giving. They might be like intentionally trying to make us trust the model. A, but i think it's lessit, it's like unclear to me what intermediate models will do.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app