Latent Space: The AI Engineer Podcast cover image

How to train a Million Context LLM — with Mark Huang of Gradient.ai

Latent Space: The AI Engineer Podcast

NOTE

Balancing Context and Scalability in Language Models

The challenge with longer context sequences in models is that self-attention leads to quadratic memory scaling, requiring more compute during training. Extending the context involves crucial trade-offs. Engineers may struggle to comprehend the complexity of training large language models with long context, and the literature is inconclusive on the optimal approach between post-training extension and inherent long-context model training due to a lack of detailed analysis on its scaling properties.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner