Fusing Kernels and Flash Attention Benefits

Kunle explains kernel fusion across the whole decoder, extending flash attention techniques to greatly reduce memory bandwidth needs.

Play episode from 19:28

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!