
The Information Bottleneck EP15: The Information Bottleneck and Scaling Laws with Alex Alemi
Nov 13, 2025
In this discussion, Alex Alemi—a prominent AI researcher from Anthropic, formerly with Google Brain and Disney—delves into the concept of the information bottleneck. He explains how it captures the essential aspects of data while avoiding overfitting. Alemi also highlights scaling laws, revealing how smaller experiments can forecast larger behaviors in AI. He offers insights on the importance of compression in understanding models and challenges researchers to pursue ambitious questions that address broader implications for society, such as job disruption.
AI Snips
Chapters
Transcript
Episode notes
Information Bottleneck As A Middle Path
- The Information Bottleneck (IB) frames learning as extracting only the bits of data relevant to a downstream variable while compressing the rest.
- IB sits between purely predictive and fully generative (Bayesian) views, forming useful conditional representations.
Compression As A Generalization Guard
- Compression acts like a PAC-Bayesian KL constraint that limits overfitting by keeping posteriors close to priors.
- Applying compression to intermediate representations gives practical conditional generalization guarantees.
Implicit Compression In Overparameterized Models
- Overparameterized neural nets appear rich but marginalizing over random seeds shows most representational information is irrelevant.
- This implicit compression explains why large networks generalize well despite huge capacity.
