LessWrong (Curated & Popular) cover image

"Against Almost Every Theory of Impact of Interpretability" by Charbel-Raphaël

LessWrong (Curated & Popular)

00:00

Interpreting the Limits of Interpretability

Exploring challenges in auditing deception in interpretability and questioning its value compared to other technical work. Critiquing interpretability techniques like Grad cam and pixel attribution, while discussing limitations and effectiveness in industry applications. Offering alternative perspectives on predicting future systems beyond the conventional theory of impact of interpretability.

Play episode from 02:09
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app