The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Is It Time to Rethink LLM Pre-Training? with Aditi Raghunathan - #747

159 snips

Sep 16, 2025

In this discussion, Aditi Raghunathan, an assistant professor at Carnegie Mellon University, tackles the limitations of large language models (LLMs). She presents insights from her award-winning paper on enhancing creativity beyond next-token prediction. Aditi introduces the innovative 'Roll the dice' method to foster randomness and 'Look before you leap' for deeper thought processes. The conversation also covers the paradox of 'catastrophic overtraining' and her pursuit of more controllable models through concepts like 'memorization sinks.' Her research aims to reshape our understanding of AI adaptability.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Benchmarks Hide Deployment Fragility

Benchmark performance can mask real-world brittleness when distributions shift slightly from training data.
Aditi argues we must study when and why models break beyond aggregate metrics.

INSIGHT

More Pretraining Can Hurt Adaptability

Training on more data can make models worse starting points for fine-tuning, a phenomenon the authors call catastrophic overtraining.
This inverse effect emerges once token-to-parameter ratios pass a threshold and makes adaptation harder.

ADVICE

Test Checkpoints Before Fine-Tuning

Probe multiple checkpoints before selecting a base model for fine-tuning to find the optimal point on the U-shaped curve.
Avoid blindly picking the latest, largest-pretrained checkpoint when you expect substantial post-training changes.

Get the Snipd Podcast app to discover more snips from this episode

Get the app