AXRP - the AI X-risk Research Podcast cover image

19 - Mechanistic Interpretability with Neel Nanda

AXRP - the AI X-risk Research Podcast

CHAPTER

How Do Induction Heads Work?

There's a fun thing in the paper where you can go play around with the attention pattern and I found a couple of heads like that in dpj and figure out what's up with those is on my long term to do list okay. One of the problems my pocket different problem sequence so they want to go try thatI'm very excited to see what you find anywayYeah and the translation head is also an induction head and my guess is it's just the same fundamental algorithm of map things too late in space look for matchesLook at the thing immediately after match and that the model has just learned how to do something sensible here. Is is it that the same head does French to English English

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner