The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Inside s1: An o1-Style Reasoning Model That Cost Under $50 to Train with Niklas Muennighoff - #721

103 snips
Mar 3, 2025
Niklas Muennighoff, a PhD student at Stanford, dives into his groundbreaking work on the S1 reasoning model, designed to efficiently mimic OpenAI's O1 while costing under $50 to train. He elaborates on innovative techniques like 'budget forcing' that help the model tackle complex problems more effectively. The discussion highlights the intricacies of test-time scaling, the importance of data curation, and the differences between supervised fine-tuning and reinforcement learning. Niklas also shares insights on the future of open-sourced AI models.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Replicating O1

  • S1 and R1 are attempts to replicate OpenAI's O1, focusing on reasoning performance and test-time scaling.
  • R1 aims for full pipeline replication, while S1 uses a minimal approach.
ANECDOTE

Niklas's Background

  • Niklas Muennighoff, a PhD student at Stanford, was studying business and finance at Peking University.
  • He switched to AI research, worked at Hugging Face, and is now a first-year PhD student.
INSIGHT

Test-Time Scaling Approaches

  • Test-time scaling has two approaches: parallel and sequential.
  • Parallel scaling runs multiple computations independently and aggregates answers, while sequential scaling refines a single reasoning trace.
Get the Snipd Podcast app to discover more snips from this episode
Get the app