Generally Intelligent cover image

Episode 33: Tri Dao, Stanford: On FlashAttention and sparsity, quantization, and efficient inference

Generally Intelligent

00:00

Sparse vs Dense Matrices for Different Applications

Sparse matrices can work well for image classification models but may not perform as well for language models with large data sets./nFor applications that can leverage flash transform, like audio or image classification with convolution, sparse matrices like monarch can work well./nFor applications with less structure, such as language models, denser matrices may be needed./nCompanies like Cerebras have found ways to use sparsity throughout training, tailored to their hardware./nCertain cases, like language model inference, are seeing exploration of sparsity./nLanguage model training typically follows a formula of collecting lots of data and training a large transformer model for good performance.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner