
Cursor Training superhuman coding models at Cursor
May 30, 2025
Guest
Roundtable Participant (Speaker 3)
Guest
Roundtable Participant (Speaker 2)
Guest
Roundtable Participant (Speaker 1)
Guest
Roundtable Participant (Speaker 0)
In a lively roundtable, technical experts from Cursor delve into the intricacies of training superhuman AI coding agents. They discuss the unique challenges of coding's large action space and the complexity of ground-truth signals. Highlights include debates on the effectiveness of reinforcement learning rewards, the importance of real-world developer feedback, and advancements in hardware to support longer contexts. The conversation reveals fascinating insights into potential future developments in AI-assisted coding and how these innovations could transform the field.
AI Snips
Chapters
Transcript
Episode notes
Coding RL Requires Multi-Step Tool Thinking
- Coding RL differs from math because the action space is much larger and reasoning is embedded in the answer.
- Models must call multiple tools iteratively, so RL needs to optimize multi-step tool interactions.
Tests Are Strong But Often Sparse Rewards
- Tests give a near-ground-truth reward when coverage is good, enabling long RL runs against them.
- Sparse test rewards make training computationally expensive and often require breaking tasks into smaller parts.
Decompose PRs Into Verifiable Subtasks
- Break large PR tasks into smaller verifiable parts to reduce reward sparsity and improve learning signal.
- Aim for intermediary checks so models get feedback more than one-in-a-thousand rollouts.
