Latent Space: The AI Engineer Podcast

MPT-7B and The Beginning of Context=Infinity — with Jonathan Frankle and Abhinav Venigalla of MosaicML

59 snips
May 20, 2023
Jonathan Frankle, Chief Scientist at MosaicML, and Abhinav Venigalla, Research Scientist at MosaicML, dive into the groundbreaking MPT-7B model. They discuss its unprecedented 84,000-token context length and how it was trained on 1 trillion tokens, achieving industry-leading performance for a fraction of the cost. The duo also navigates the complexities of AI model training, ethical considerations in creative generation, and the balance between open research and business interests, providing fascinating insights into the future of AI technologies.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ANECDOTE

MosaicML's Shift to Model Building

  • MosaicML wasn't initially planning to build its own model.
  • They changed their mind to demonstrate their platform's capabilities and respond to community challenges.
INSIGHT

The Mystery of LLM Data

  • Data selection for LLMs is crucial but poorly understood, with quality and repetition effects unclear.
  • C4, a dataset from 2019 with seemingly arbitrary preprocessing, performs surprisingly well.
INSIGHT

Evaluating LLMs

  • Evaluating LLMs is difficult, as existing metrics don't capture real-world usage.
  • Human evaluation is ideal but impractical, necessitating reliance on imperfect automated metrics and vibe checks.
Get the Snipd Podcast app to discover more snips from this episode
Get the app