Developing Sequence Models for Handling Large Context | 38sec snip from The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

NOTE

Developing Sequence Models for Handling Large Context

The speaker's interest in sequence modeling stemmed from their fascination with conceptual capability questions, particularly focused on developing models to handle long or potentially infinite context. Their core motivation when starting to work on sequence models was handling very large context, leading them to prefer using recurrent models over transformer models due to the unique appeal of the recurrent approach.

00:00

Transcript

Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.