Eric Michaud on scaling, grokking and quantum interpretability

1

Introduction

00:00 • 2min

2

The Risks of Advanced AI Systems

01:40 • 2min

3

How I Learned to Work in Deep Learning

03:37 • 4min

4

The Functional Unit of a Deep Neural Network

08:06 • 5min

5

The Quantization Model of Neural Scaling

13:02 • 2min

6

How to Predict a Token in Natural Language

15:13 • 5min

7

How to Predict Power Laws on Tokens

20:07 • 2min

8

How to Cluster a Tokenizer

22:14 • 2min

9

The Power Law Governing How Useful Knowledge Is for Prediction

24:35 • 2min

10

The Power of Clustering in Language Models

26:53 • 2min

11

The Power of Clusters in Language Models

28:38 • 3min

12

The Chinchilla Scaling Law

31:49 • 2min

13

The Messier Scaling Exponents of Open AI in the Wild

33:20 • 2min

14

The Quantum Interpretability of Scaling Exponents

34:57 • 2min

15

Groking and Groking in Neural Networks

37:07 • 3min

16

The Importance of Generalization in Network Learning

39:53 • 2min

17

The Learning of Modular Representations in Networks

41:57 • 2min

18

Omnigrok Grokking Beyond Algorithmic Data

43:33 • 2min

19

The Overarching Message of Grokking

45:17 • 3min