The Inside View cover image

Neel Nanda on mechanistic interpretability, superposition and grokking

The Inside View

00:00

Mechanistic Interpretability in AI

This chapter explores the concept of mechanistic interpretability in AI, discussing the challenges and importance of understanding the algorithms learned by neural networks. It delves into specific examples, such as how a one layer transformer neural network performs modular addition using rotations around the unit circle. The chapter also touches on induction heads and the mechanism of attention within transformers.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app