AXRP - the AI X-risk Research Podcast cover image

1 - Adversarial Policies with Adam Gleave

AXRP - the AI X-risk Research Podcast

00:00

A, You're Gonna Be Able to Discover a Blind Spot

Self play makes it easier to shed your preconceptions. Some of the environments we're attacking were a symetric so they weren't actually training m against t them themselves, but they were only training against one other agent. S you can sely imagine hat both agents might have is can have shared blind spot, and neither is incentive is to fix it because it's not being xploited by avet agent.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app