The Basic Physics of Interpretability

The interpretability work I've described so far is a bit more kind of like the sort of like top down interpretability. Most of the time when people talk about interpretability, they mean mechanistic interpretability. So that's basically we're going to like sort of like think of this as sort of like the basic physics version of interpretability. There's a lot on topic is then good, you know, sort of the pioneer of this is then awesome work. Neil Nann, there's a person who's sort of maybe you've seen sort of online and is active and has done some really interesting work on this. For example, sometimes it seems like neural networks suddenly understand the thing

Play episode from 34:11

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app