Cursor Scales Coding Agents
Link: Scaling long-running autonomous coding
Cursor experimented for weeks with hundreds of parallel coding agents running autonomously over extended periods—some for several weeks straight. After failed attempts with self-coordination, they landed on a Planner/Worker model: Planners create tasks, Workers execute them, a Judge decides whether to continue.
The Examples
- Web Browser from scratch (1M+ LoC, ~1 week) → Web Platform Conformance Tests as success criterion
- Solid→React Migration (+266K/-193K edits, 3+ weeks) → existing code must work the same
- Java LSP (550K LoC) → Language Server Protocol Spec
- Windows 7 Emulator (1.2M LoC) → known behavior
- Excel Clone (1.6M LoC) → known behavior
My Take
These are all projects with clear success criteria—either there’s a spec (Browser, LSP), a test suite, or known behavior to replicate. Coding agents are good at solving “implement the spec.” What I don’t see: how well does this work for a novel product?
Interesting Quote
“The right amount of structure is somewhere in the middle. Too little structure and agents conflict, duplicate work, and drift. Too much structure creates fragility.”
Funny enough, I read the same thing in Sutherland’s Scrum handbook—about human teams. The challenges in agent coordination are apparently the same as with humans.
Update: Implied Success
The browser from the example doesn’t build and never did. Interesting, because the blog post never explicitly claims it works—but strongly implies it:
“Cursor never says ’this browser is production-ready’, but they do frame it as ‘building a web browser from scratch’ and ‘meaningful progress’ and then use a screenshot and ’extremely difficult’ language, wanting to give the impression that this experiment actually was a success.”
Classic marketing: impressive numbers (1M+ LoC!) and screenshots, but no verifiable results.