
The Information Bottleneck EP8: RL with Ahmad Beirami
Oct 7, 2025
Ahmad Beirami, a former Google researcher, dives into the intricacies of reinforcement learning and its relevance to AI models. He highlights the evaluation challenges in AI research and argues for a shift towards deeper analysis rather than chasing small gains. Ahmad also critiques the current conference review system, revealing its strain and the issues it creates. Discussions include agent workflows, the implications of quantization, and the need for better methods in RL evaluation, all emphasizing the importance of integrating theoretical insights with empirical work.
AI Snips
Chapters
Transcript
Episode notes
Agents Drive Growing Compute Demand
- Demand for GPU compute continues to grow as models enable more autonomous tasks and agents.
- Agent workflows amplify token consumption and create strong demand for efficiency.
Validate Distilled Models Beyond Benchmarks
- For narrow tasks, fine-tune or distill smaller models but expect hidden generalization losses.
- Validate on broader implicit capabilities (reasoning, instruction following) before deployment.
Prefer Verifier RL For Robust Distillation
- Use verifier-based RL fine-tuning to improve generalization when distilling capabilities into smaller models.
- Maintain KL regularization to preserve pretrained capabilities during distillation and multitask learning.
