How Promising Is Mechanistic Interpretability?

We need interpretability research to understand what's going on so that we can see whether what we're doing is actually working. We need it to create these feedback loops between creating a system and then seeing how the system works and then improving the system. But I think there's a bunch of other routes by which we might reach this goal. The field of make AI not kill everyone is a healthier field if it has people pursuing a bunch of approaches, he says.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app