AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Is Mechanistic Interpretability the Only Path to Understanding Neural Networks?
I would really prefer to live in a world where we actually understand these things. And I think that mechanistic interpretability is not obviously the only path to get to a point where we can make some claim to understand systems. But it's a promising path that achieves a pretty high level of rigor and reliability and understanding. The more we can claim that we actually understand what's going on inside these models, the better off we'll be.