
19 - Mechanistic Interpretability with Neel Nanda
AXRP - the AI X-risk Research Podcast
Induction Heads in a Two Layer Model
Redwood's research have been doing this work with causal scrubbing they're trying to build these rigorous technique. They found a circuit that appears in two layer intentionally models that looks at the current token and it says has this token appeared in the past if yes then let's assume the thing that came after it is going to come nextYeah so you could imagine that like if you got a token James and you want to figure out what comes next you're like is this a piece about James Bond or just some random dude called James? So we want the induction head to attend from James to the first occurrence of Bond yeah because Bond is preceded by James which is a copy of the current token
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.