

Is It Time to Rethink LLM Pre-Training? with Aditi Raghunathan - #747
141 snips Sep 16, 2025
In this discussion, Aditi Raghunathan, an assistant professor at Carnegie Mellon University, tackles the limitations of large language models (LLMs). She presents insights from her award-winning paper on enhancing creativity beyond next-token prediction. Aditi introduces the innovative 'Roll the dice' method to foster randomness and 'Look before you leap' for deeper thought processes. The conversation also covers the paradox of 'catastrophic overtraining' and her pursuit of more controllable models through concepts like 'memorization sinks.' Her research aims to reshape our understanding of AI adaptability.
AI Snips
Chapters
Transcript
Episode notes
Benchmarks Hide Deployment Fragility
- Benchmark performance can mask real-world brittleness when distributions shift slightly from training data.
- Aditi argues we must study when and why models break beyond aggregate metrics.
More Pretraining Can Hurt Adaptability
- Training on more data can make models worse starting points for fine-tuning, a phenomenon the authors call catastrophic overtraining.
- This inverse effect emerges once token-to-parameter ratios pass a threshold and makes adaptation harder.
Test Checkpoints Before Fine-Tuning
- Probe multiple checkpoints before selecting a base model for fine-tuning to find the optimal point on the U-shaped curve.
- Avoid blindly picking the latest, largest-pretrained checkpoint when you expect substantial post-training changes.