Interconnects

Interviewing OLMo 2 leads: Open secrets of training language models

11 snips
Jan 22, 2025
Luca Soldaini, the Data lead for the OLMo project at AI2, joins the discussion to unveil the intricacies of training language models. He shares tales of overcoming challenges in pretraining efficiency and the quest for stability, especially after a significant 70B model attempt. The conversation dives into the strategic decisions behind building effective language modeling teams, the intricate balance of deep versus wide network architectures, and the importance of community-driven advancements in AI.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ANECDOTE

OLMo's Origin

  • AI2's OLMo project began after a proposal to AMD was initially ignored.
  • After ChatGPT's release, AMD renewed interest, leading to OLMo's development.
INSIGHT

Initial Model Size

  • OLMo's initial size was inspired by Llama 1's smallest model.
  • The team aimed to recreate Llama 1's 7B model with 1.4 trillion tokens.
ANECDOTE

Learning from OPT and Bloom

  • Early language models like OPT and Bloom, while not high-performing, provided valuable resources.
  • Their extensive logs documented potential training issues, which helped the OLMo team.
Get the Snipd Podcast app to discover more snips from this episode
Get the app