The Information Bottleneck

EP8: RL with Ahmad Beirami

Oct 7, 2025
Ahmad Beirami, a former Google researcher, dives into the intricacies of reinforcement learning and its relevance to AI models. He highlights the evaluation challenges in AI research and argues for a shift towards deeper analysis rather than chasing small gains. Ahmad also critiques the current conference review system, revealing its strain and the issues it creates. Discussions include agent workflows, the implications of quantization, and the need for better methods in RL evaluation, all emphasizing the importance of integrating theoretical insights with empirical work.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Agents Drive Growing Compute Demand

  • Demand for GPU compute continues to grow as models enable more autonomous tasks and agents.
  • Agent workflows amplify token consumption and create strong demand for efficiency.
ADVICE

Validate Distilled Models Beyond Benchmarks

  • For narrow tasks, fine-tune or distill smaller models but expect hidden generalization losses.
  • Validate on broader implicit capabilities (reasoning, instruction following) before deployment.
ADVICE

Prefer Verifier RL For Robust Distillation

  • Use verifier-based RL fine-tuning to improve generalization when distilling capabilities into smaller models.
  • Maintain KL regularization to preserve pretrained capabilities during distillation and multitask learning.
Get the Snipd Podcast app to discover more snips from this episode
Get the app