
1 - Adversarial Policies with Adam Gleave
AXRP - the AI X-risk Research Podcast
00:00
How to Control a Victin Policy in Sumo?
In a new study, researchers looked at the activations of victim and adversary policies. They found that an adversarial policy very reliably induced extremely unlikely activations when playing against normal opponents. This suggests it's not just being off distribution, ah, but we're like systematically finding some part of a state space. The results could be used to control for human-ant conflict in sumo wrestling.
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.