Make AI Work in Big Repos: Spec-First Workflow and Frequent Intentional Compaction

Read Articleadded Sep 23, 2025
Make AI Work in Big Repos: Spec-First Workflow and Frequent Intentional Compaction

AI coding tools can succeed in large, complex codebases today by structuring the entire development process around context control and spec-first artifacts. The author’s “Frequent Intentional Compaction” workflow (research → plan → implement with subagents and aggressive summarization) enables rapid, high-quality changes while keeping teams mentally aligned. It isn’t magic—human review and expertise remain vital—and the hardest part is organizational change, not model capability.

Key Points

  • AI can work in large, complex codebases today if you redesign the development process around context engineering, not just prompts.
  • Frequent Intentional Compaction (research → plan → implement) keeps context small, correct, and on-trajectory; use subagents to search/summarize without polluting the main context.
  • Focus human review on the highest-leverage artifacts—research and plans—to prevent cascades of bad code and maintain mental alignment.
  • Real-world results: rapid bug fixes and 35k LOC of features shipped to a 300k LOC Rust repo in hours, with approvals and working demos.
  • It’s not magic: engagement and expertise still matter, and the approach can fail if research is shallow; the biggest challenge is organizational and workflow change, not model capability.

Sentiment

The discussion is highly polarized, with a significant segment expressing strong skepticism and outright opposition to the article's claims and the implied future of software engineering, while another segment enthusiastically agrees and shares similar successful, structured AI-first workflows.

In Agreement

  • AI coding tools are effective in large codebases when a structured, spec-first, plan-driven workflow (similar to 'Frequent Intentional Compaction') is adopted, with human review focused on research and plans rather than just code.
  • Many users share similar successful workflows involving detailed PRDs, iterative AI code generation, and human verification, finding that AI acts as a powerful tool with engaged human steering.
  • The nature of code review is shifting from line-by-line inspection to higher-level verification of specifications and behaviors, enabling higher productivity.
  • Using 'ask me clarifying questions' prompts and statically typed languages (like TypeScript or Go) helps improve AI output and catch errors earlier.
  • The process, though requiring learning and effort, ultimately leads to significant productivity gains and can smooth over the 'death by a million paper cuts' in complex projects.
  • The problems with 'vibecoding' are overcome by rigorous planning and review, which is more about 'abstraction' and 'hyperengineering' than delegation.

Opposed

  • There is a lack of concrete, demoable products to substantiate claims of AI's effectiveness in building large, complex systems, with many AI projects reportedly 'falling apart in not-always-obvious ways' or requiring extensive manual fixing.
  • AI struggles with the final, crucial 10-20% of complex tasks, and its generated UIs, low-level code, or concurrent logic can be 'weird,' 'garbage,' or fundamentally flawed.
  • The cost of advanced AI models and subscriptions is high ($12k/month for a team) and may not be justified by the actual productivity gains, which some argue are comparable to a skilled human engineer or even make users less effective.
  • The idea of not reviewing every line of AI-generated code and the acceptance of huge PRs (20k-35k LOC) is seen as 'hostile,' 'disrespectful,' 'unreviewable,' and a 'joke' that degrades the software engineering profession.
  • Managing AI's context and inputs can be as much or more work than writing the code manually, leading to a loss of intrinsic motivation for engineers who prefer to solve problems directly.
  • AI-generated tests are often poor quality, can encode incorrect assumptions, add unnecessary runtime, and do not provide the same ergonomic feedback as human-written tests.
  • AI exhibits bias when verifying its own code, necessitating separate 'red team' agents for review, and often fails or 'stops working' when encountering concepts outside its training data.
Make AI Work in Big Repos: Spec-First Workflow and Frequent Intentional Compaction