Tri Dao is known for his groundbreaking work on Flash Attention at Stanford, enabling a fast and efficient implementation of the Attention mechanism in Transformers, opening up possibilities for much longer sequence length in GPT-4, Claude Anthropic as well as in images, video and audio.
We sat down with Tri Dao to discuss the impact of his pioneering work on software/hardware co-design and some of the new innovation that's coming in the world of transformers and generative AI.