

Episode 40: DeepSeek facts vs hype, model distillation, and open source competition
49 snips Jan 31, 2025
In this engaging discussion, Kate Soule, Director of Technical Product Management at Granite, Chris Hay, Distinguished Engineer and CTO of Customer Transformation, and Aaron Baughman, IBM Fellow and Master Inventor dive into the realities behind DeepSeek R1. They debunk myths surrounding its hype and discuss the true implications of model distillation for AI competition. The trio explores the evolving landscape of open-source AI and how recent advancements can reshape industry strategy, shedding light on efficiency and innovation in model training.
AI Snips
Chapters
Transcript
Episode notes
DeepSeek's Impact: Divided Opinions
- DeepSeek R1's significance is debated; Kate Soule gives it a 5/10.
- Chris Hay, however, rates it a 9.11 or 9.9 out of 10.
Misleading Cost of Training
- The $5.5M figure for training an AI model like DeepSeek R1 is misleading.
- It represents only one iteration's cost and doesn't include extensive pre-training.
Laptop Training Success
- Chris Hay trained a small 1.5B parameter model on his laptop.
- It achieved GPT-4 level math performance with minimal fine-tuning data, showing long-chain-of-thought's importance.