AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Importance of Adaptive Sparsity
The data sparsity also seems super interesting. A tricky thing is how do you actually just work in terms of deciding which things to turn on or off, right? Like being so binary, you feel as happy for gradients. I think there are a lot of clever approaches to figure out the sparsity mat. There are approaches like hashing and clustering. Some folks have tried. They show some positive results in some settings. So it ran to be seen if these things can actually work at large scale for the task that maybe a lot of people care about right now, like language modeling.