The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

209 snips
Jun 5, 2025
In this insightful conversation, Charles Martin, the founder of Calculation Consulting and an AI researcher merging physics with machine learning, introduces WeightWatcher, a groundbreaking tool for enhancing Deep Neural Networks. He explores the revolutionary Heavy-Tailed Self-Regularization theory and how it exposes phases like grokking and generalization collapse. The discussion delves into fine-tuning models, the perplexing relationship between model quality and hallucinations, and the challenges of generative AI, providing valuable lessons for real-world applications.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ANECDOTE

WeightWatcher Origin Story

  • Charles Martin founded WeightWatcher to help evaluate AI models without access to data.
  • His experience combining theoretical physics tools with AI led to insightful model analysis.
INSIGHT

Training Layers Like Cake Baking

  • Proper training requires balancing learning across all model layers like baking layers of a cake evenly.
  • Overheating some layers causes overfitting and stops the model from generalizing well.
ADVICE

Challenges in Fine-Tuning

  • Fine-tuning large models on real data is challenging due to messy and changing data pipelines.
  • Use tools like WeightWatcher to monitor models and detect training problems when access to data is limited.
Get the Snipd Podcast app to discover more snips from this episode
Get the app