
1 - Adversarial Policies with Adam Gleave
AXRP - the AI X-risk Research Podcast
00:00
A, I Think We Didn't Choose a Particular Layer of a Network.
There wasn't any significant difference w confidence intervals between different normal opponents. I think o only exception was in suma. For some reason, repugnant seemd to be a more different to usual. And if you density model on opponent onehow how surprising are the activations induced by opponent two? Oh, im ther y very generally, a, like, pretty hard to distinguish im.
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.