

19 - Mechanistic Interpretability with Neel Nanda
4 snips Feb 4, 2023
Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
Introduction
00:00 • 2min
What's Happening in the Final Neural Network?
02:17 • 2min
Is Mechanistic Interpretability the Only Path to Understanding Neural Networks?
04:18 • 2min
The Science of Deep Learning and Mechanistic Interpretability
06:10 • 5min
Is There a Threshold for Not Publishing Mechanistic Interpretability Work?
11:05 • 5min
How Scale Invariant Do You Think We Should Think of the Insights as Being?
16:10 • 2min
Scaling Laws and Deep Learning
17:48 • 4min
Scaling Laws Are Less Useful for AI Expert Reduction or AI Alignment
21:40 • 3min
Is There a Spectrum of Cognitive Abilities?
24:15 • 2min
Language Model Interpretability
25:48 • 5min
The Second Mesh Thing to Bear in Mind When Using Transformers
30:44 • 3min
What's the Difference Between Sensory Reasoning and Processing?
33:25 • 2min
Is the Eiffel Tower Located in Paris?
35:08 • 2min
Using MLPs to Reverse Enter the Network
37:15 • 3min
The Modeling of Attention Heads in Image Models
40:00 • 5min
Is It Possible to Do the Same in Image Models?
45:03 • 2min
Using a Multilayer Perceptron to Train a Linear Map in a Vision Transformer
46:52 • 3min
AI Learning How to Do What Are You Doing?
49:59 • 2min
Is Its Output Not Interpretable?
51:59 • 2min
Automated Machine Learning and Machine Learning in a Neural Network
53:41 • 2min
How Close Do You Think We're to Automation at Any Level of the Spectrum?
56:10 • 4min
Activation Patching Is a Great Way to Find Out What a Neuron Does
01:00:38 • 3min
Using GPT-2 to Find a Neuron
01:03:35 • 2min
Red Teaming Mercanturp Research
01:05:58 • 4min
How to Get Into the Field of Mechanistic Interpretability
01:09:57 • 2min
The Three Papers You've Helped Reverse Engineer a Network
01:11:49 • 3min
Reverse Engineering a Transformer and Induction Heads
01:14:50 • 5min
How to Train a Smaller Model to Grok Modular Addition
01:19:51 • 4min
How to Get Higher Reward in a Way That You Didn't Think Possible
01:23:35 • 2min
Anthropic Contributions Statements
01:25:20 • 2min
Is This Path Analysis Going to Be Too Unwieldy to Be Useful?
01:27:18 • 4min
Reverse Engineering and Networks
01:31:12 • 3min
A Softmax Is a Matrix of Keys and Values and Attention Heads
01:33:58 • 5min
MLPs Are Really Hot Yeah So What's Going on Here?
01:38:54 • 6min
The Key Takeaway From This Paper Is That Attention Is a Parameterized Matrix
01:44:53 • 6min
Using the Token Embeddings in a Model Is a Good Idea
01:50:41 • 2min
Using Contextual Information in Model Composition
01:52:24 • 3min
How to Use QK and v Composition in Path Analysis
01:55:22 • 4min
Induction Heads in a Two Layer Model
01:58:53 • 3min
The Induction Head in a Two-Layer Attentionally Model
02:02:16 • 3min
Queue Composition Is Using Prior Information to Figure Previously Computed Information
02:05:23 • 2min
Why Do You Think Induction Was the First Thing at Tulum?
02:07:25 • 4min
Short Text Learning and Induction Heads
02:11:05 • 2min
Is There More Than One Induction Head?
02:12:54 • 2min
Induction Heads Are Relevant to Context Learning?
02:15:18 • 4min
Using Induction Heads to Match Translation Heads
02:19:48 • 2min
How Do Induction Heads Work?
02:21:34 • 2min
Indirect Identification
02:23:42 • 2min
Induction Heads Are Different Kinds of Things?
02:25:34 • 2min
How Many Induction Heads Do You Have?
02:27:14 • 2min
Induction Heads Paper - Groking
02:29:30 • 2min
Is the Fourth Line of Evidence Really the Case?
02:31:03 • 2min
The Correlation Between the Types of Evidence
02:33:30 • 3min
The Induction Heads in Large Models Aren't as Important as They Used to Be
02:36:35 • 2min
The Principal Component Analysis of Induction Heads
02:38:48 • 2min
The Losses Depend on the Log Prop of the Correct Next Token
02:40:33 • 2min
Is the Principal Axis of the Models Positive or Negative?
02:42:23 • 2min
Is That the First Principal Component?
02:44:12 • 2min
PCA - Is There a Kick Don't You?
02:45:43 • 2min
How to Improve the Loss of a Token
02:47:17 • 2min
Is There Any Light Shone on This Mystery?
02:49:22 • 2min
iClear - Random Loss in a One Layer Transform
02:50:58 • 3min
The Modular Edition Algorithm
02:53:32 • 4min
Modular Addition Algorithm
02:57:08 • 2min
Yep and That's the Basis of This Algorithm
02:59:04 • 2min
The Basic Algorithm of a Transformer
03:00:46 • 3min
Using a 113 Arithmetic Module You're Learning the Sign Function on 113 Data
03:04:00 • 2min
Generalizing or Memorizing?
03:06:01 • 4min
Using a Trigonometrical Algorithm I'm Defining a Second Progress Measure - Excluded Loss
03:09:46 • 2min
Is There a Suspension in Test Loss?
03:11:38 • 2min
Is There Something Weird About the Optimizer?
03:13:52 • 2min
The Third Reason Why You Shouldn't Expect Face Transitions With Adam Based Optimizers
03:15:57 • 5min
How to Train a Neural Network to Complete a Lottery Ticket Hypothesis
03:21:00 • 3min
Is It a Base Transition?
03:24:16 • 3min
A Sharp Left Turn Is the New Heart Phrase for This
03:27:31 • 2min
How Much Addition in a Toy One Layer Transformer?
03:29:42 • 4min
What's Up With the MLP Layers?
03:34:09 • 5min
Reverse Engineering Models on Rearrangement Learning Problems
03:38:39 • 5min
Aren't Art Networks Just Fundamentally Not Interpretable?
03:43:59 • 5min
How Can You Follow Me on Twitter?
03:49:29 • 3min