AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How to Find Bad Neurons That Do Things You Don't Want
This is a pretty different approach from what you've seen, even if it is pretty, I don't know, logical in terms of how it is implemented. It's not surprising that this works. You can do this as a human pretty easily. Now, the other question is, how is this actually useful beyond equinability? There's an argument to be made that you could find bad neurons that do things you don't want. Maybe you can edit out certain knowledge or certain ideas within a language model. So some caveats on the impacts of this and sort of the significance, but still it is very cool.