
Episode 33: Tri Dao, Stanford: On FlashAttention and sparsity, quantization, and efficient inference
Generally Intelligent
00:00
The Importance of Adaptive Sparsity
The data sparsity also seems super interesting. A tricky thing is how do you actually just work in terms of deciding which things to turn on or off, right? Like being so binary, you feel as happy for gradients. I think there are a lot of clever approaches to figure out the sparsity mat. There are approaches like hashing and clustering. Some folks have tried. They show some positive results in some settings. So it ran to be seen if these things can actually work at large scale for the task that maybe a lot of people care about right now, like language modeling.
Transcript
Play full episode