

Catherine Olsson and Nelson Elhage: Anthropic, Understanding Transformers
16 snips Aug 26, 2022
Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Introduction
00:00 • 2min
Anthropics
02:09 • 3min
Anthropic and Interpretability - What's Next?
05:07 • 3min
Is Mechanistic Interpretability a Good Idea?
08:02 • 5min
You've Got a Program on Mechanistic Interpretability
12:50 • 5min
The OV Matrix and the Path Expansion Frame
18:05 • 2min
Toy Transformers - How Do You Even Try to Do This?
20:17 • 2min
The Basics of Attention Only Models
21:58 • 5min
What Are Induction Heads?
26:44 • 2min
Induction Heads
29:02 • 2min
What Is in Context Learning and Induction Heads?
30:38 • 2min
The Intact Context Learning Score
32:29 • 3min
Induction Heads in Larger Models - The Key Takeaway
35:09 • 4min
Getting Traction in the MLP Layers
38:51 • 4min
How to Replicate the Induction Heads Findings
42:35 • 2min
The Gradient Podcast - Part 2
44:53 • 2min