The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Recurrence and Attention for Long-Context Transformers with Jacob Buckman - #750

190 snips
Oct 7, 2025
Jacob Buckman, co-founder and CEO of Manifest AI, dives deep into the world of long-context transformers. He discusses innovative techniques like windowed attention and the revolutionary Power Retention approach, which melds attention and recurrence for astonishing training speeds. Buckman also shares insights on Manifest AI's open-source tools, Vidrial and PowerCoder, and explores the significance of metrics in measuring context utility. Learn about the balance between state and weight for optimal compute architectures and the future potential of context length in AI.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Context As An Independent Scaling Axis

  • Context length is a distinct axis of scale that governs a model's ability to synthesize large inputs like long text or video.
  • Improving context utilization during pretraining reduces downstream task gap when extra context is available.
INSIGHT

In-Context Learning Curves Reveal Utility

  • In-context learning curves measured by negative log likelihood show each extra token can give log-linear improvements.
  • Pretraining evaluation on NLL gives a clean way to quantify how much additional context helps prediction.
INSIGHT

Retention Connects Attention And Recurrence

  • Many long-context architectures (Mamba, RetNet, SSMs) are structurally similar to Transformers with different time-mixing primitives.
  • Retention unifies attention and recurrence by having dual recurrent and attention-equivalent forms.
Get the Snipd Podcast app to discover more snips from this episode
Get the app