AXRP - the AI X-risk Research Podcast cover image

19 - Mechanistic Interpretability with Neel Nanda

AXRP - the AI X-risk Research Podcast

CHAPTER

The Three Papers You've Helped Reverse Engineer a Network

The first part of the interview focuses on mathematical framework for transformer circuits. The second half looks at progress measures for rocking via mechanistic interpretability which is this active independent research he did after leaving anthropic. He also talks about his work with Chris Ola Nelson El-Harsh and Catherine Olson in context learning and induction heads. At the bottom of the page, please share your own insights into these three papers.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner