The Importance of Adaptive Sparsity

The data sparsity also seems super interesting. A tricky thing is how do you actually just work in terms of deciding which things to turn on or off, right? Like being so binary, you feel as happy for gradients. I think there are a lot of clever approaches to figure out the sparsity mat. There are approaches like hashing and clustering. Some folks have tried. They show some positive results in some settings. So it ran to be seen if these things can actually work at large scale for the task that maybe a lot of people care about right now, like language modeling.

Play episode from 59:14

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app