
EP8: RL with Ahmad Beirami
The Information Bottleneck
00:00
RL Fine-Tuning, Verifiers, and KL Regularization
Ahmad explains how RL-style fine-tuning with verifiers induces KL regularization and aids generalization.
Play episode from 12:21
Transcript


