Training superhuman coding models at Cursor

May 30, 2025

Guest

Roundtable Participant (Speaker 3)

Guest

Roundtable Participant (Speaker 2)

Guest

Roundtable Participant (Speaker 1)

Guest

Roundtable Participant (Speaker 0)

In a lively roundtable, technical experts from Cursor delve into the intricacies of training superhuman AI coding agents. They discuss the unique challenges of coding's large action space and the complexity of ground-truth signals. Highlights include debates on the effectiveness of reinforcement learning rewards, the importance of real-world developer feedback, and advancements in hardware to support longer contexts. The conversation reveals fascinating insights into potential future developments in AI-assisted coding and how these innovations could transform the field.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Coding RL Requires Multi-Step Tool Thinking

Coding RL differs from math because the action space is much larger and reasoning is embedded in the answer.
Models must call multiple tools iteratively, so RL needs to optimize multi-step tool interactions.

INSIGHT

Tests Are Strong But Often Sparse Rewards

Tests give a near-ground-truth reward when coverage is good, enabling long RL runs against them.
Sparse test rewards make training computationally expensive and often require breaking tasks into smaller parts.

ADVICE

Decompose PRs Into Verifiable Subtasks

Break large PR tasks into smaller verifiable parts to reduce reward sparsity and improve learning signal.
Aim for intermediary checks so models get feedback more than one-in-a-thousand rollouts.

Get the Snipd Podcast app to discover more snips from this episode

Get the app