The Super Mask

The super mask was definitely an accidental discovery from the paper, which I think is the thing I'm most excited about that came out of that work. Essentially, what we found was that not only are these sub networks that you identify through this magnitude pruning process, but they already have good performance even before you train them. It's like 80% on MNIST and 41% on CIFAR 10 or something. The original sign confident experiment in our paper no longer worse when you scale up the data set and a model. So at image net scale and on like large res nets, things like that, keeping the sign alone no longer works.

Play episode from 13:05

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app