
Training superhuman coding models at Cursor
Cursor
00:00
Optimizing for Real-World Human Rewards
They debate optimizing for user acceptance, rerolls, Pass@K signals, and deriving reward models from choices.
Play episode from 25:53
Transcript

They debate optimizing for user acceptance, rerolls, Pass@K signals, and deriving reward models from choices.