Is There a Better Way to Train Super Masks?

I would love to see better ways of training super masks or just training masks on top of weights. Because I think there's a lot of unexplored use cases for these masks. So you could imagine that this kind of training process might be less prone to overfitting, for example. Like your underlying ways is still coming from a random initialization. And another use case that I've seen some papers do is to use it as a way to probe a trained model. Now instead of looking at a mask on top of randomly initialized weights, you take a trained model and then you try to find a mask that maybe correspond to a certain objective. That's really interesting.

Play episode from 29:55

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app