Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Introduction
00:00 • 2min
The Risks of Advanced AI Systems
01:40 • 2min
How I Learned to Work in Deep Learning
03:37 • 4min
The Functional Unit of a Deep Neural Network
08:06 • 5min
The Quantization Model of Neural Scaling
13:02 • 2min
How to Predict a Token in Natural Language
15:13 • 5min
How to Predict Power Laws on Tokens
20:07 • 2min
How to Cluster a Tokenizer
22:14 • 2min
The Power Law Governing How Useful Knowledge Is for Prediction
24:35 • 2min
The Power of Clustering in Language Models
26:53 • 2min
The Power of Clusters in Language Models
28:38 • 3min
The Chinchilla Scaling Law
31:49 • 2min
The Messier Scaling Exponents of Open AI in the Wild
33:20 • 2min
The Quantum Interpretability of Scaling Exponents
34:57 • 2min
Groking and Groking in Neural Networks
37:07 • 3min
The Importance of Generalization in Network Learning
39:53 • 2min
The Learning of Modular Representations in Networks
41:57 • 2min
Omnigrok Grokking Beyond Algorithmic Data
43:33 • 2min
The Overarching Message of Grokking
45:17 • 3min