Latent Space: The AI Engineer Podcast cover image

FlashAttention 2: making Transformers 800% faster w/o approximation - with Tri Dao of Together AI

Latent Space: The AI Engineer Podcast

NOTE

How to Optimize for Hardware Lottery

The speaker discusses the challenges of running models efficiently on different hardware, emphasizing the dominance of NVIDIA GPUs and the software framework for transformer models. They mention the hardware and software lottery, where popular architectures are optimized for specific hardware and software frameworks, making it harder to run alternative models efficiently. The speaker suggests that compilers could play a role in optimizing code for different devices, but acknowledges the difficulty in finding a good solution. In their own research, they have to consider both new algorithms/models and hardware compatibility.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner