
Episode 33: Tri Dao, Stanford: On FlashAttention and sparsity, quantization, and efficient inference
Generally Intelligent
The Future of Recurrent Models
The idea is that recurrent models don't need to store the entire history when they do inference. Recurrent models instead would compress the history into a fixed state vector. Some examples for example, RWKV, which is this recurrent model that some folks have trained up to 14 billion parameters that seem to perform on par with transformer. That's not as proven, but that's a direction around exciting that.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.