Hattie Zhou, Mila: Supermasks, iterative learning, and fortuitous forgetting

1

Introduction

00:00 • 2min

2

Is the Model Doing Some Kind of Implicit Curriculum Learning While It's Being Trained?

02:09 • 2min

3

Is There a Trick for Generalization?

04:13 • 3min

4

How to Get an Extremely Adversarial Initialization

06:45 • 2min

5

The Lottery Ticket Hypothesis

08:38 • 2min

6

How to Train a Sparse Network?

10:52 • 2min

7

The Super Mask

13:05 • 5min

8

Is There a Way to Change Architectures?

18:01 • 4min

9

How to Find Sub-Networks That Are the Right Shape of the Solution?

21:44 • 4min

10

Is There a Sparse Architecture in Computer Vision?

25:34 • 2min

11

What's the Biggest Takeaway From the Lottery Ticket Hypothesis Paper?

27:47 • 2min

12

Is There a Better Way to Train Super Masks?

29:55 • 2min

13

Is There a Way to Control the Behavior of Pre-Trained Models?

31:59 • 3min

14

What Is a Not-Correlational?

34:56 • 3min

15

The Zero Values Are Relevant Still. Is That Really a Good Idea?

37:28 • 5min

16

The Story of Coherent Gradients

42:14 • 2min

17

Tendexure, I Love That!

43:48 • 3min

18

Increasing Compositionality Through Iterative Learning

46:55 • 5min

19

Is There a Way to Improve Model Performance?

51:29 • 3min

20

Is It a Dropout Intuition?

54:38 • 2min

21

The Later Layers Are Learning of the Tile of All Features

56:10 • 2min

22

The Fortuitous Forgetting Paper

57:42 • 5min

23

Is Knowledge Evolution Really Useful in Transfer Learning?

01:02:18 • 5min

24

Unsupervised Environment Design

01:07:20 • 3min

25

Are You Using Your Model to Identify Desirable Versus Unwanted Information?

01:09:59 • 4min

26

Is There a Generalization of Desirable Versus Unwanted?

01:13:45 • 3min

27

Getting Rid of Specifics for Spurious Features

01:16:28 • 2min

28

Is There a Difference Between a Cow and a Grass Cow?

01:18:04 • 2min

29

How Do You Interpret a Scene?

01:19:37 • 2min

30

Is There a Way to Unlearn?

01:21:55 • 5min

31

Is There a Difference Between Chris Ola and Grande?

01:26:57 • 3min

32

Is There a Limit to Context Learning?

01:30:00 • 3min

33

Is Your Model Not Reasoning?

01:33:00 • 3min

34

Is There a Scalable Compositionality?

01:36:28 • 5min

35

Compositional and Trans-Horror Red Scale Part 2

01:41:42 • 2min

36

Is There a Culture That Makes You More Effective?

01:44:11 • 3min