AXRP - the AI X-risk Research Podcast cover image

1 - Adversarial Policies with Adam Gleave

AXRP - the AI X-risk Research Podcast

00:00

How to Control a Victin Policy in Sumo?

In a new study, researchers looked at the activations of victim and adversary policies. They found that an adversarial policy very reliably induced extremely unlikely activations when playing against normal opponents. This suggests it's not just being off distribution, ah, but we're like systematically finding some part of a state space. The results could be used to control for human-ant conflict in sumo wrestling.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app