
Episode 20: Hattie Zhou, Mila, on supermasks, iterative learning, and fortuitous forgetting
Generally Intelligent
00:00
How to Train a Sparse Network?
In your paper, why is the magnitude pruning algorithm the right way to identify these lottery tickets? It seems very simple. So why are these ways that end up large, also good if you trade them in isolation from their initializations? That was the main question I think I wanted to answer. And the other questions are, what about the subnetwork is important? Why is this prepare a subnetwork good for the training process? And could other ways of identifying subnetworks also work? But as we were exploring and finding new things, then new questions came up.
Transcript
Play full episode