AXRP - the AI X-risk Research Podcast cover image

19 - Mechanistic Interpretability with Neel Nanda

AXRP - the AI X-risk Research Podcast

00:00

The Three Papers You've Helped Reverse Engineer a Network

The first part of the interview focuses on mathematical framework for transformer circuits. The second half looks at progress measures for rocking via mechanistic interpretability which is this active independent research he did after leaving anthropic. He also talks about his work with Chris Ola Nelson El-Harsh and Catherine Olson in context learning and induction heads. At the bottom of the page, please share your own insights into these three papers.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app