Generally Intelligent cover image

Episode 33: Tri Dao, Stanford: On FlashAttention and sparsity, quantization, and efficient inference

Generally Intelligent

CHAPTER

The Impact of Sparse Training on Language Models

We found some parameterization that involved sparsely that's expressive and hardware efficient. We were able to use that to train image classification models, let's say on ImageNet,. That gives actual walk-clock speed up while maintaining the same level of quality. Now these things are being used at the salmonova, which is a chip company. And lots of works are now building on the idea of sparse training.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner