AXRP - the AI X-risk Research Podcast cover image

21 - Interpretability for Engineers with Stephen Casper

AXRP - the AI X-risk Research Podcast

00:00

Benchmarking Interpretability Tools for Deep Neural Networks

John Defterios: I think this is a good segway to talk about your paper called Benchmarking interpretability tools for deep neural networks. You co-authored with Yuxia Li, Jia Wei Li, Tongbou, Kevin Zhang, and Dylan Hadfield-Mannell. It's basically about basically benchmarking interpretable tools for whether they can detect certain Trojans that you implant in networks. He says the reasons why we use Trojans are largely born out of consistency; having a well known ground truth makes them easy to evaluate. defterios: The final reason why it's useful to using Trojans is that I think Trojans

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app