EP8: RL with Ahmad Beirami

Oct 7, 2025

Ahmad Beirami, a former Google researcher, dives into the intricacies of reinforcement learning and its relevance to AI models. He highlights the evaluation challenges in AI research and argues for a shift towards deeper analysis rather than chasing small gains. Ahmad also critiques the current conference review system, revealing its strain and the issues it creates. Discussions include agent workflows, the implications of quantization, and the need for better methods in RL evaluation, all emphasizing the importance of integrating theoretical insights with empirical work.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Agents Drive Growing Compute Demand

Demand for GPU compute continues to grow as models enable more autonomous tasks and agents.
Agent workflows amplify token consumption and create strong demand for efficiency.

ADVICE

Validate Distilled Models Beyond Benchmarks

For narrow tasks, fine-tune or distill smaller models but expect hidden generalization losses.
Validate on broader implicit capabilities (reasoning, instruction following) before deployment.

ADVICE

Prefer Verifier RL For Robust Distillation

Use verifier-based RL fine-tuning to improve generalization when distilling capabilities into smaller models.
Maintain KL regularization to preserve pretrained capabilities during distillation and multitask learning.

Get the Snipd Podcast app to discover more snips from this episode

Get the app