AXRP - the AI X-risk Research Podcast cover image

1 - Adversarial Policies with Adam Gleave

AXRP - the AI X-risk Research Podcast

00:00

A, I Think We Didn't Choose a Particular Layer of a Network.

There wasn't any significant difference w confidence intervals between different normal opponents. I think o only exception was in suma. For some reason, repugnant seemd to be a more different to usual. And if you density model on opponent onehow how surprising are the activations induced by opponent two? Oh, im ther y very generally, a, like, pretty hard to distinguish im.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app