Mechanistic Interpretability in AI

There are a couple labs who are doing some really phenomenal work in this area. The basic idea is that you want to understand how the models function, what kinds of properties they have and where those properties came from. There's been some really, really full work by Colin Raffill on understanding what the training data does to language model performance.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app