Machine Learning Street Talk (MLST)

Prof. Randall Balestriero - LLMs without pretraining and SSL

123 snips
Apr 23, 2025
Randall Balestriero, an AI researcher renowned for his work on self-supervised learning and geographic bias, explores fascinating findings in AI training. He reveals that large language models can perform well even without extensive pre-training. Randall also highlights the similarities between self-supervised and supervised learning, emphasizing their potential for improvement. Additionally, he discusses biases in climate models, demonstrating the risks of relying on their predictions, particularly for vulnerable regions, which has significant policy implications.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

LLMs Train Well From Scratch

  • Large language models (LLMs) with billions of parameters can be trained from random initialization on small supervised datasets without severe overfitting.
  • This challenges the idea that massive pre-training on huge data is always necessary for good performance on specific tasks.
INSIGHT

Pre-training Benefits Limited Tasks

  • For specific classification tasks, randomly initialized models trained from scratch can achieve performance comparable to costly pre-trained LLMs.
  • Pre-training mainly benefits open-ended generation tasks rather than all discriminative ones.
INSIGHT

Generalization vs Specialized Models

  • Task-specific models trained from scratch excel on narrow, in-distribution challenges but cannot generalize well out-of-distribution.
  • Pre-trained LLMs offer broader reasoning ability for open-ended or novel tasks beyond training data.
Get the Snipd Podcast app to discover more snips from this episode
Get the app