Generally Intelligent cover image

Episode 33: Tri Dao, Stanford: On FlashAttention and sparsity, quantization, and efficient inference

Generally Intelligent

00:00

The Opposition to Attention

For the people who are skeptical of this direction, what are their objections? One objection is that attention works fine for the current models. When you start increasing the sequence length, which I think for lots of applications, then attention starts to become unwieldy or becomes a bottleneck again. So we just took a different path, which is, hey, if we, what if we try other alternatives that could also work quite well?"

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner