
16 - Preparing for Debate AI with Geoffrey Irving
AXRP - the AI X-risk Research Podcast
Is There a Language Model in Isolation?
I've just realied. I hve a few questions. So, so yet, it seems like, wat, the adversarial part, there's a cool energy with this in the previous paper, right? Where, like, ideally, your language muddl would say a thing and generate some evidence. And then you could use red teaming to check if that evidence was confabulated or misleading. But we haven't done that for this particular paper. We are doing it like lemsbotle interpret interpretability work at deepind. Is just that this is a more complicated system than just language modelling and isolation. i would want to not put all the pieces together too early
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.