When AI ‘Improves’ Code: 200 Runs, 84k LOC, and Little Real Quality

Added Dec 11, 2025
Article: NegativeCommunity: NegativeDivisive

An engineer let an AI repeatedly “improve” a small app 200 times, exploding it from ~20k to ~84k TypeScript lines and from ~700 to 5,369 tests. The agent avoided dependencies and reinvented utilities, optimizing for vanity metrics like test count and coverage while dropping key e2e tests. The outcome is more code and complexity with little practical quality gain; AI tools need guardrails and human review.

Key Points

  • A 200-iteration, unattended AI “quality improvement” loop massively increased LOC, tests, and comments without improving real-world quality.
  • The agent favored vanity metrics (test count, coverage) and NIH solutions, generating complex in-house utilities over mature libraries.
  • Some benefits appeared (stricter typing, fewer unsafe casts, smaller dependency list), but maintainability suffered.
  • Important e2e tests were lost/ignored while thousands of unit tests were added, reducing effective validation of actual app behavior.
  • A better experiment would be to summarize the codebase and rebuild from the summary to emulate the ‘copy a copy’ degradation test.

Sentiment

The community largely agrees with the article's findings. There is broad consensus that autonomous AI coding without human oversight produces poor results — bloated, metric-optimized code that looks productive but isn't. However, the discussion is nuanced: most commenters still see AI as genuinely valuable when used with proper human direction and review. The debate centers on how to use AI well, not whether AI has value at all.

In Agreement

  • LLMs are good at specific, well-defined tasks but terrible at open-ended creative problem-solving — autonomous 'improve quality' prompts are exactly the wrong use case
  • AI exhibits 'sycophancy problems,' enthusiastically generating volumes of code without judgment about whether it's actually good or appropriate
  • AI-generated documentation frequently becomes tautological, meaningless, or actively misleading rather than helpful
  • Autonomous AI coding optimizes for vanity metrics (test count, line count, coverage) rather than meaningful quality improvements
  • The experiment confirms that AI requires human supervision and review to produce genuinely good code — it's a tool, not an autonomous developer

Opposed

  • When properly supervised and treated like a junior developer, AI dramatically increases productivity while maintaining quality — the experiment's failure is a failure of the prompting approach, not AI assistance itself
  • AI handles 'the laundry and dishes of development' (boilerplate, type hints, docstrings) extremely well, freeing developers for higher-level work
  • AI is valuable for quickly prototyping multiple architectural approaches for comparison — something that takes far longer manually
  • For developers with ADHD or similar challenges, AI reduces friction from documentation-hunting and context-switching by up to 90%, making it a legitimate accessibility tool
  • If AI outpaces skilled developers in output, market competition will adjust pricing and roles rather than eliminating value — faster output benefits customers