“How Well Does RL Scale?” by Toby_Ord

Nov 3, 2025

Toby Ord, a philosopher focused on global catastrophic risks and AI safety, delves into the scaling of reinforcement learning (RL) in his insightful analysis. He reveals that while RL facilitates longer thought processes for language models, its efficiency is limited compared to inference scaling. Deployment costs rise sharply with increased inference tokens, and achieving performance gains through RL demands astronomical compute increases. Ord discusses the governance implications of these findings, suggesting a shift in AI progress strategies.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

RL Mainly Unlocks Longer Reasoning

Reinforcement learning (RL) for LLMs produces most gains by enabling much longer chains of thought.
These inference-length gains matter more than the modest intelligence boost at fixed answer length.

INSIGHT

Two Distinct Forms Of Compute Scaling

I distinguish two compute scalings: RL training compute and inference compute for deployment.
RL training improves reasoning technique while inference scaling lets the model think longer on each problem.

ADVICE

Plan For Higher Deployment Costs

Expect deployment costs to rise when reasoning uses much longer chains of thought.
Factor increased per-use inference costs into AI governance and safety planning.

Get the Snipd Podcast app to discover more snips from this episode

Get the app