AXRP - the AI X-risk Research Podcast cover image

16 - Preparing for Debate AI with Geoffrey Irving

AXRP - the AI X-risk Research Podcast

CHAPTER

Is There a Language Model in Isolation?

I've just realied. I hve a few questions. So, so yet, it seems like, wat, the adversarial part, there's a cool energy with this in the previous paper, right? Where, like, ideally, your language muddl would say a thing and generate some evidence. And then you could use red teaming to check if that evidence was confabulated or misleading. But we haven't done that for this particular paper. We are doing it like lemsbotle interpret interpretability work at deepind. Is just that this is a more complicated system than just language modelling and isolation. i would want to not put all the pieces together too early

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner