
Jesse Hoogland on Developmental Interpretability and Singular Learning Theory
The Inside View
00:00
The Problem With Interpretability in Large Systems
The problem is obviously with very large systems, how do you figure out all the things that are going on inside of a neural network? Maybe you can find many of the big picture things, but it's very hard to find all the little details. Develop mental interpretability proposes that we study how structure forms over the course of training. And I think maybe it's more tractable to find out what's going on in the neural network at the end. If we just understand each individual transition over the courseof training. That might be much more tractable than trying to understand how structure is at the end of training.
Transcript
Play full episode