AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Activation Patching Is a Great Way to Find Out What a Neuron Does
activation patching is a way to flip the answer from Rome to Paris. It turns out that most things don't matter some things matter a ton and that just patching in a single activation can often be enough to like significantly flip things. There are also techniques that are much more kind of suggestive where it gives some evidence i think it should be part of a talket you use with heavy caution one example of this is a really dumb technique figuring out what a neuron does you look at its max activating data set examples yep they're all recipes.