The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

225 snips

Jun 5, 2025

In this insightful conversation, Charles Martin, the founder of Calculation Consulting and an AI researcher merging physics with machine learning, introduces WeightWatcher, a groundbreaking tool for enhancing Deep Neural Networks. He explores the revolutionary Heavy-Tailed Self-Regularization theory and how it exposes phases like grokking and generalization collapse. The discussion delves into fine-tuning models, the perplexing relationship between model quality and hallucinations, and the challenges of generative AI, providing valuable lessons for real-world applications.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

ANECDOTE

WeightWatcher Origin Story

Charles Martin founded WeightWatcher to help evaluate AI models without access to data.
His experience combining theoretical physics tools with AI led to insightful model analysis.

INSIGHT

Training Layers Like Cake Baking

Proper training requires balancing learning across all model layers like baking layers of a cake evenly.
Overheating some layers causes overfitting and stops the model from generalizing well.

ADVICE

Challenges in Fine-Tuning

Fine-tuning large models on real data is challenging due to messy and changing data pipelines.
Use tools like WeightWatcher to monitor models and detect training problems when access to data is limited.

Get the Snipd Podcast app to discover more snips from this episode

Get the app