
Training superhuman coding models at Cursor
Cursor
00:00
Process Reward Models vs Outcome Rewards
They analyze why process verifiers struggle and why outcome-based rewards enable longer RL optimization.
Play episode from 33:13
Transcript

They analyze why process verifiers struggle and why outcome-based rewards enable longer RL optimization.