
Catherine Olsson and Nelson Elhage: Anthropic, Understanding Transformers
The Gradient: Perspectives on AI
Induction Heads in Larger Models - The Key Takeaway
In this paper, we explore the hypothesis that induction heads might constitute the mechanism for the majority of us in context learning. Chris was studying some small transformer models and happened to look at a snapshot earlier in training than he was used to,. He couldn't find the induction heads and he was mystified until he realized that he had loaded the wrong snapshot. And between those two points was a visible bump in the loss curve where all of a sudden the model improved visibly. That was also the point at which the in-context learning score got dramatically better.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.