Latent Space: The AI Engineer Podcast

2024 in Post-Transformers Architectures (State Space Models, RWKV) [LS Live @ NeurIPS]

29 snips
Dec 24, 2024
Dan Fu, an AI researcher soon to join UCSD, and Eugene Cheah, CEO of Featherless AI, delve into the future of post-transformer architectures. They discuss innovations like RWKV and state-space models, highlighting their collaborative and open-source nature. The duo examines the challenges of multilingual training and computational efficiency, while also exploring advancements in non-transformer models like Mamba and Jamba. Tune in for insights on scaling models, 'infinite context,' and how new architectures are reshaping the AI landscape!
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Scaling Challenges

  • Scaling model parameter size increases capabilities, like conversational AI.
  • However, attention mechanisms scale quadratically with context length, demanding more compute resources.
INSIGHT

Linear Attention Limitations

  • Attention mechanisms compare every token to every other, causing quadratic scaling.
  • Early linear attention attempts in 2020 struggled with quality and hardware efficiency.
INSIGHT

State Space Models

  • State space models (SSMs) offer a more principled approach to sequence modeling, borrowing ideas from signal processing.
  • SSMs can be computed efficiently using FFT convolutions, achieving n log n scaling.
Get the Snipd Podcast app to discover more snips from this episode
Get the app