The Inside View cover image

Neel Nanda on mechanistic interpretability, superposition and grokking

The Inside View

CHAPTER

Mechanistic Interpretability in AI

This chapter explores the concept of mechanistic interpretability in AI, discussing the challenges and importance of understanding the algorithms learned by neural networks. It delves into specific examples, such as how a one layer transformer neural network performs modular addition using rotations around the unit circle. The chapter also touches on induction heads and the mechanism of attention within transformers.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner