AXRP - the AI X-risk Research Podcast cover image

6 - Debate and Imitative Generalization with Beth Barnes

AXRP - the AI X-risk Research Podcast

CHAPTER

The Importance of More Interpretability

Beclare: I'm really excited about people just doing a bunch more interpretability. There's some kind of window for a sort of treachurous town, where there's the between when a model is smart enough to think about a deception and when it's good enough to get away with it. Beclare: Even if we can just do very crude things, like, you know, is this model just like thinking about some category of stuff, it really shouldn't be thinking about or using without an agent? And i think even you think that's just like, way to demanding, and we're never going to bealed to do anything like that.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner