
EP8: RL with Ahmad Beirami
The Information Bottleneck
00:00
Human Data, Verifiers, and Dynamic Exploration
They debate human-labeled data needs, verifiers for RL, and dynamic data acquisition for learning.
Play episode from 32:24
Transcript

They debate human-labeled data needs, verifiers for RL, and dynamic data acquisition for learning.