
19 - Mechanistic Interpretability with Neel Nanda
AXRP - the AI X-risk Research Podcast
How Do Induction Heads Work?
There's a fun thing in the paper where you can go play around with the attention pattern and I found a couple of heads like that in dpj and figure out what's up with those is on my long term to do list okay. One of the problems my pocket different problem sequence so they want to go try thatI'm very excited to see what you find anywayYeah and the translation head is also an induction head and my guess is it's just the same fundamental algorithm of map things too late in space look for matchesLook at the thing immediately after match and that the model has just learned how to do something sensible here. Is is it that the same head does French to English English
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.