Interconnects

New Talk: Building Olmo 3 Think

108 snips
Dec 10, 2025
Dive into the latest breakthroughs in reinforcing learning models and the pressing need for more open reasoning systems. Discover insights on architectural choices, the necessity of long-context data, and how high-quality datasets shape effective training. Uncover the advantages of DPO, its impressive speed gains over RL, and learn about the innovative RLVR pipeline for performance enhancement. Lastly, explore evaluation strategies that ensure robustness in model comparisons while tackling practical efficiency in post-training processes.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Clean Base Models Are Essential For RL Research

  • Reinforcement-learning-focused reasoning models require a clean, verifiable base model to trust RL research results.
  • Data contamination or training artifacts can create spurious RL gains that mask true algorithmic effects.
ADVICE

Design Architecture For Post-Training Efficiency

  • Design pretraining architecture with post-training inference and RL cost in mind to avoid huge extra GPU expenses.
  • Add memory-efficient mechanisms like Group Query Attention to drastically reduce inference and RL costs.
ADVICE

Select Pretraining Data With Tiny-Model Sweeps

  • Use small probe models trained on candidate data subsources and regress their downstream loss to find the best pretraining mix.
  • Sweep many tiny models to quantify which data improves final performance before scaling up.
Get the Snipd Podcast app to discover more snips from this episode
Get the app