
Neel Nanda on Avoiding an AI Catastrophe with Mechanistic Interpretability
Future of Life Institute Podcast
How Promising Is Mechanistic Interpretability?
We need interpretability research to understand what's going on so that we can see whether what we're doing is actually working. We need it to create these feedback loops between creating a system and then seeing how the system works and then improving the system. But I think there's a bunch of other routes by which we might reach this goal. The field of make AI not kill everyone is a healthier field if it has people pursuing a bunch of approaches, he says.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.