Is There a Shift in Interpretability?

There was a shift from example-based concept-based interpretability that has occurred both in your own work and interpretability more broadly. And even as recently as I think 2016, you had this paper, examples are not enough learned to criticize where you were looking into this MMD critic method. Could you tell me a little bit about that shift in thinking? How that occurred for you?

Play episode from 32:32

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app