Make AI Work in Big Repos: Spec-First Workflow and Frequent Intentional Compaction

AI coding tools can succeed in large, complex codebases today by structuring the entire development process around context control and spec-first artifacts. The author’s “Frequent Intentional Compaction” workflow (research → plan → implement with subagents and aggressive summarization) enables rapid, high-quality changes while keeping teams mentally aligned. It isn’t magic—human review and expertise remain vital—and the hardest part is organizational change, not model capability.

Key Points

AI can work in large, complex codebases today if you redesign the development process around context engineering, not just prompts.
Frequent Intentional Compaction (research → plan → implement) keeps context small, correct, and on-trajectory; use subagents to search/summarize without polluting the main context.
Focus human review on the highest-leverage artifacts—research and plans—to prevent cascades of bad code and maintain mental alignment.
Real-world results: rapid bug fixes and 35k LOC of features shipped to a 300k LOC Rust repo in hours, with approvals and working demos.
It’s not magic: engagement and expertise still matter, and the approach can fail if research is shallow; the biggest challenge is organizational and workflow change, not model capability.

Sentiment

The community is genuinely divided. There is broad acknowledgment that AI coding tools can be useful, but strong skepticism about the grander claims that specs will replace code or that this workflow achieves the dramatic productivity gains described. The practical tips about context management and research-first approaches are received more favorably than the philosophical arguments about abstraction. Many experienced engineers remain unconvinced that the overhead of these workflows produces net positive results compared to traditional coding with lighter AI assistance.

In Agreement

The research-plan-implement loop with deliberate context compaction genuinely helps AI agents navigate large, unfamiliar codebases — multiple practitioners report success with similar workflows across different tech stacks.
Reviewing specs and plans rather than massive code diffs is a more scalable approach to code review when AI generates most of the code.
AI coding tools excel at eliminating boilerplate friction and 'death by a million paper cuts' in large codebases, making development more enjoyable even when the overall time savings are modest.
Good documentation and clean codebase organization are prerequisites for effective AI coding — the cost of getting AI productive in a codebase is a useful proxy for technical debt.
The role of engineers is genuinely shifting toward defining and verifying behavior through specs and tests rather than writing every line of implementation code.

Opposed

LLMs are fundamentally non-deterministic unlike compilers — you cannot treat specs as 'the new code' when the same prompt produces different implementations each time, requiring full code review anyway.
The article's productivity claims are misleading: the '35k LOC in 7 hours' actually took seven days to ship, much of the LOC was auto-generated, and one PR remained unmerged — making the sample size too small for broad conclusions.
Human review and context-switching are unavoidable bottlenecks that negate the theoretical gains from running multiple parallel AI agents, echoing the Mythical Man-Month principle that you cannot simply add parallelism to speed up complex work.
The elaborate workflow of writing specs, managing context windows, and engineering prompts may not be more productive than traditional coding with light AI assistance — it just shifts the work from coding to spec management.
AI coding tools degrade developer skills and understanding of codebases over time, and the high costs ($12k/month for a team of three, $650 for two PRs) may not be justified by the actual productivity gains.
Natural language specs are inherently vague and incomplete — if you write them precisely enough to reliably produce correct code, you end up writing something that looks a lot like a programming language anyway.