CSPI Podcast cover image

AI Alignment as a Solvable Problem | Leopold Aschenbrenner & Richard Hanania

CSPI Podcast

00:00

The Basic Physics of Interpretability

The interpretability work I've described so far is a bit more kind of like the sort of like top down interpretability. Most of the time when people talk about interpretability, they mean mechanistic interpretability. So that's basically we're going to like sort of like think of this as sort of like the basic physics version of interpretability. There's a lot on topic is then good, you know, sort of the pioneer of this is then awesome work. Neil Nann, there's a person who's sort of maybe you've seen sort of online and is active and has done some really interesting work on this. For example, sometimes it seems like neural networks suddenly understand the thing

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app