AXRP - the AI X-risk Research Podcast cover image

21 - Interpretability for Engineers with Stephen Casper

AXRP - the AI X-risk Research Podcast

00:00

The Failure of Saliency and Attribution Methods

More successes than failures? Sorry, more failures than successes. Yeah. I guess it is sort of embarrassing that they couldn't like that you have some image with like a cartoon smiley face and that cartoon smiley faces this trigger to get it classified. So much work has been put on into feature attribution and saliency research in in recent years. It's worth noting that these do have potential like practical, societal legal uses. But from an engineer's standpoint, right? sorry, I'm just missing something.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app