
EP8: RL with Ahmad Beirami
The Information Bottleneck
00:00
What RL Really Does in LLMs
Ahmad argues current RL is often distillation using online rollouts and highlights evaluation blind spots.
Play episode from 14:45
Transcript

Ahmad argues current RL is often distillation using online rollouts and highlights evaluation blind spots.