
1 - Adversarial Policies with Adam Gleave
AXRP - the AI X-risk Research Podcast
00:00
A, You're Gonna Be Able to Discover a Blind Spot
Self play makes it easier to shed your preconceptions. Some of the environments we're attacking were a symetric so they weren't actually training m against t them themselves, but they were only training against one other agent. S you can sely imagine hat both agents might have is can have shared blind spot, and neither is incentive is to fix it because it's not being xploited by avet agent.
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.