Future of Life Institute Podcast cover image

Neel Nanda on Avoiding an AI Catastrophe with Mechanistic Interpretability

Future of Life Institute Podcast

CHAPTER

Reverse Engineered Induction Heads Within Mechanistic Interpretability So Far

induction heads are part of an attention layer where the attention layer is built up over these heads that can kind of be thought of as independently. And because it's an attention layer, this is the model doing something sophisticated with finding relevant information and moving it around. The task being done is so a fact about text is it often contains repeated text. If Michael Jackson came up in the past, this comes next.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner