
Episode 33: Tri Dao, Stanford: On FlashAttention and sparsity, quantization, and efficient inference
Generally Intelligent
00:00
Sparse vs Dense Matrices for Different Applications
Sparse matrices can work well for image classification models but may not perform as well for language models with large data sets./nFor applications that can leverage flash transform, like audio or image classification with convolution, sparse matrices like monarch can work well./nFor applications with less structure, such as language models, denser matrices may be needed./nCompanies like Cerebras have found ways to use sparsity throughout training, tailored to their hardware./nCertain cases, like language model inference, are seeing exploration of sparsity./nLanguage model training typically follows a formula of collecting lots of data and training a large transformer model for good performance.
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.