AXRP - the AI X-risk Research Podcast cover image

6 - Debate and Imitative Generalization with Beth Barnes

AXRP - the AI X-risk Research Podcast

00:00

The Importance of More Interpretability

Beclare: I'm really excited about people just doing a bunch more interpretability. There's some kind of window for a sort of treachurous town, where there's the between when a model is smart enough to think about a deception and when it's good enough to get away with it. Beclare: Even if we can just do very crude things, like, you know, is this model just like thinking about some category of stuff, it really shouldn't be thinking about or using without an agent? And i think even you think that's just like, way to demanding, and we're never going to bealed to do anything like that.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app