Automated Interpretability for Neurons

The work that we've kind of started last year where we released a paper earlier this year on automated interpretability and here the idea is like basically what you would want is like you would want to have a technique that both works in the level of detail of individual neurons so that you can make sure you don't miss any details. The way to then that scale that to the entire model is you need automation right yeah but you can do thatonce once you figure out how to do it on the detail then you just record what you're doing.

Transcript

Play full episode

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app