
Episode 33: Tri Dao, Stanford: On FlashAttention and sparsity, quantization, and efficient inference
Generally Intelligent
The Importance of Adaptive Sparsity
The data sparsity also seems super interesting. A tricky thing is how do you actually just work in terms of deciding which things to turn on or off, right? Like being so binary, you feel as happy for gradients. I think there are a lot of clever approaches to figure out the sparsity mat. There are approaches like hashing and clustering. Some folks have tried. They show some positive results in some settings. So it ran to be seen if these things can actually work at large scale for the task that maybe a lot of people care about right now, like language modeling.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.