
"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland
LessWrong (Curated & Popular)
00:00
Neural Networks and Interpretability
Chris Ola, the interpretability legend is working on looking really hard at all the nurons to see what they all mean. The approach he pioneered is circuits, looking at computational sub graphs of the network and interpreting those idea decompiling the network into a better representation that is more interpretable in context. One result i heard about recently, a linear soft max unit stretches space and encourages neuron mono semanticity - making a neuron represent only one thing,. s opposed to firing on many unrelated concepts. This makes the network easier to interpret.
Play episode from 21:30
Transcript


