AXRP - the AI X-risk Research Podcast cover image

1 - Adversarial Policies with Adam Gleave

AXRP - the AI X-risk Research Podcast

00:00

A, I Think We Didn't Choose a Particular Layer of a Network.

There wasn't any significant difference w confidence intervals between different normal opponents. I think o only exception was in suma. For some reason, repugnant seemd to be a more different to usual. And if you density model on opponent onehow how surprising are the activations induced by opponent two? Oh, im ther y very generally, a, like, pretty hard to distinguish im.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner