Interconnects cover image

Interconnects

Interviewing OLMo 2 leads: Open secrets of training language models

Jan 22, 2025
Luca Soldaini, the Data lead for the OLMo project at AI2, joins the discussion to unveil the intricacies of training language models. He shares tales of overcoming challenges in pretraining efficiency and the quest for stability, especially after a significant 70B model attempt. The conversation dives into the strategic decisions behind building effective language modeling teams, the intricate balance of deep versus wide network architectures, and the importance of community-driven advancements in AI.
01:12:43

Episode guests

Podcast summary created with Snipd AI

Quick takeaways

  • The development of OLMo 2 involved learning from previous model setbacks and adapting strategies to enhance pre-training efficiency.
  • The team emphasized the importance of high-quality data and effective filtering techniques to significantly improve model performance.

Deep dives

Origins and Development of Ulmo

Ulmo's journey began in late 2022 when discussions with AMD aimed at collaboration spurred interest in a large language model project. The team initially sought to enhance the Bloom model by adding a vast amount of data, but the rapid evolution of the AI landscape post-ChatGPT led them to pivot their focus. By February 2, 2023, the project took shape through collective brainstorming among researchers, marking the formal establishment of Ulmo. The approach was characterized by a bottom-up initiative where team members juggled Ulmo alongside their ongoing commitments to other projects.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner