

Language Modeling With State Space Models with Dan Fu - #630
10 snips May 22, 2023
Join Dan Fu, a PhD student at Stanford, as he dives into the evolving landscape of language modeling. He discusses the limitations of state space models and explores innovative techniques like Flash Attention, which enhances memory efficiency for processing longer sequences. Dan also shares insights on using synthetic languages to improve models and the quest for alternatives that outperform current attention-based methods. His research promises exciting advancements for the future of AI in understanding language.
AI Snips
Chapters
Transcript
Episode notes
Quadratic Scaling of Attention
- Attention, the core of large language models (LLMs), scales quadratically with sequence length.
- This is because every word is compared to every other word, limiting context length.
Human Language Processing vs. LLMs
- LLMs face context length limitations due to quadratic scaling, hindering their ability to retain extensive information.
- Human conversation doesn't require remembering every word, suggesting potential for subquadratic language processing.
Flash Attention and Memory Optimization
- Flash Attention optimized memory usage in LLMs by making it linear instead of quadratic.
- This allows for longer sequences (e.g., 32,000 words) to fit on GPUs, enabling fine-tuning for extended contexts.