
Neel Nanda on Avoiding an AI Catastrophe with Mechanistic Interpretability
Future of Life Institute Podcast
Reverse Engineered Induction Heads Within Mechanistic Interpretability So Far
induction heads are part of an attention layer where the attention layer is built up over these heads that can kind of be thought of as independently. And because it's an attention layer, this is the model doing something sophisticated with finding relevant information and moving it around. The task being done is so a fact about text is it often contains repeated text. If Michael Jackson came up in the past, this comes next.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.