
EP8: RL with Ahmad Beirami
The Information Bottleneck
00:00
Evaluation Confounds: Temperature and Sampling
The hosts discuss how sampling choices and framework differences create confounds in LLM benchmarks.
Play episode from 23:11
Transcript

The hosts discuss how sampling choices and framework differences create confounds in LLM benchmarks.