AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Importance of Mechanistic Interpretability
I think where I was gonna go with this is like, you know, people talk about like do by default or alignment by default. Like I think it might be kind of statistical. We just need to get more ways of like increasing the likelihood that we can control our models and understand what's going on in them. But I don't think of it as a binary of like works and not works. We're gonna develop more. And I do think that over the next two to three years, we're gonna start eating that probability mass of ways things can go wrong.