Reverse Engineered Induction Heads Within Mechanistic Interpretability So Far

induction heads are part of an attention layer where the attention layer is built up over these heads that can kind of be thought of as independently. And because it's an attention layer, this is the model doing something sophisticated with finding relevant information and moving it around. The task being done is so a fact about text is it often contains repeated text. If Michael Jackson came up in the past, this comes next.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app