LessWrong (Curated & Popular) cover image

"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland

LessWrong (Curated & Popular)

00:00

Neural Networks and Interpretability

Chris Ola, the interpretability legend is working on looking really hard at all the nurons to see what they all mean. The approach he pioneered is circuits, looking at computational sub graphs of the network and interpreting those idea decompiling the network into a better representation that is more interpretable in context. One result i heard about recently, a linear soft max unit stretches space and encourages neuron mono semanticity - making a neuron represent only one thing,. s opposed to firing on many unrelated concepts. This makes the network easier to interpret.

Play episode from 21:30
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app