Latent Space: The AI Engineer Podcast cover image

FlashAttention 2: making Transformers 800% faster w/o approximation - with Tri Dao of Together AI

Latent Space: The AI Engineer Podcast

00:00

Memory Architectures in GPU Performance

This chapter explores the differences between high bandwidth memory (HBM) and static random-access memory (SRAM) in GPU architecture, detailing how algorithms can optimize data movement to enhance performance. It highlights the significance of historical techniques in modern memory management and discusses the collaborative research environment at Hazy Research. Additionally, the chapter addresses the evolving roles of academia and industry in AI development, emphasizing the importance of evaluation methods and benchmarks in guiding model advancements.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app