AXRP - the AI X-risk Research Podcast cover image

21 - Interpretability for Engineers with Stephen Casper

AXRP - the AI X-risk Research Podcast

00:00

The Relationship Between Anomalies and Interpretability

A team of researchers trained a model to beat the best models that play Go. They were able to do it by looking at what their adversary was doing. The work suggests even superhuman systems might have silly vulnerabilities, says Andrew Keen. "High frequency, non robust, non interpretable features are kind of the enemy of interpretability"

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app