
Training superhuman coding models at Cursor
Cursor
00:00
Variance Reduction and GRPO Advantages
They compare GRPO, PPO and variance techniques, favoring multiple rollouts when value models are weak.
Play episode from 41:47
Transcript

They compare GRPO, PPO and variance techniques, favoring multiple rollouts when value models are weak.