Generally Intelligent cover image

Episode 33: Tri Dao, Stanford: On FlashAttention and sparsity, quantization, and efficient inference

Generally Intelligent

00:00

The Impact of Sparse Training on Language Models

We found some parameterization that involved sparsely that's expressive and hardware efficient. We were able to use that to train image classification models, let's say on ImageNet,. That gives actual walk-clock speed up while maintaining the same level of quality. Now these things are being used at the salmonova, which is a chip company. And lots of works are now building on the idea of sparse training.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app