Latent Space: The AI Engineer Podcast cover image

FlashAttention 2: making Transformers 800% faster w/o approximation - with Tri Dao of Together AI

Latent Space: The AI Engineer Podcast

CHAPTER

Memory Architectures in GPU Performance

This chapter explores the differences between high bandwidth memory (HBM) and static random-access memory (SRAM) in GPU architecture, detailing how algorithms can optimize data movement to enhance performance. It highlights the significance of historical techniques in modern memory management and discusses the collaborative research environment at Hazy Research. Additionally, the chapter addresses the evolving roles of academia and industry in AI development, emphasizing the importance of evaluation methods and benchmarks in guiding model advancements.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner