The MAD Podcast with Matt Turck

State of LLMs 2026: RLVR, GRPO, Inference Scaling — Sebastian Raschka

62 snips
Jan 29, 2026
Sebastian Raschka, AI researcher and educator known for practical ML guides and his book on building LLMs, walks through 2025–2026 shifts in large models. He compares architectures like transformers, world models, and text diffusion. He explains RLVR and GRPO post-training methods, warns about benchmark gaming, and highlights inference‑time scaling and private data as key drivers.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
INSIGHT

Transformers Still Dominate

  • Transformers remain the state-of-the-art scaffold, with many effective tweaks rather than full replacements.
  • Alternatives like diffusion and state-space models trade costs and capabilities and aren't yet superior overall.
INSIGHT

RLVR + GRPO Unlock Reasoning

  • RLVR (reinforcement learning with verifiable rewards) unlocks reasoning by rewarding objectively verifiable outcomes like math or code.
  • GRPO pairs with RLVR to cut costs and complexity by removing learned reward/value models and using relative comparisons.
ADVICE

Prefer Verifiable Rewards When Possible

  • Use verifiable checks (parsers, unit tests, Wolfram) as rewards to avoid training large reward models.
  • Replace learned reward/value models with algorithmic verification where possible to reduce memory and cost.
Get the Snipd Podcast app to discover more snips from this episode
Get the app