Post‑Training Advances: RL with Verifiable Rewards (RLVR)

Nathan explains RLVR: generate, grade, and optimize on verifiable tasks (math, code) and debates contamination and evaluation.

Play episode from 01:55:44

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!