The Myth of Specification-Generated Code

The author argues that the promise of generating code from high-level specifications is a fallacy because any spec detailed enough to work is effectively just code. By examining OpenAI's Symphony project, she shows how these 'specs' are often just AI-generated pseudocode that fails to produce reliable results in practice. Ultimately, she maintains that there are no shortcuts to the precision required for software engineering, and low-effort specs only lead to low-quality code.
Key Points
- A specification precise enough to generate working code must necessarily be as complex and structured as code itself.
- OpenAI's Symphony project is a misleading example of agentic coding because its specification document consists of pseudocode, database schemas, and 'cheat sheets' for the AI.
- The industry's focus on delivery speed has turned specification writing into a source of AI-generated 'slop' rather than a contemplative engineering practice.
- AI agents struggle to generalize beyond their training data and fail to reliably implement software from specifications, especially in non-mainstream languages.
- The labor of precision cannot be bypassed; transmuting code into a verbal specification only changes the medium without reducing the required effort.
Sentiment
The community broadly agrees with the article's thesis. Most commenters affirm that specifications written in natural language cannot substitute for code, that human judgment remains essential in software development, and that LLMs are useful assistive tools but not reliable autonomous code generators. A vocal minority argues the article is too pessimistic about LLMs' practical utility for standard software, and some accuse skeptics of motivated reasoning driven by career anxiety. The overall tone is thoughtful and technical rather than hostile.
In Agreement
- Natural language is inherently too imprecise to define a program; formal specifications capable of ensuring correctness are essentially code themselves
- LLMs can only reliably reproduce patterns already in their training data and cannot genuinely generalize beyond it, as demonstrated by their difficulty with less common languages like Haskell
- Human developers bring irreplaceable capabilities: pushing back on faulty specs, exercising prudential judgment, introspecting on user needs, and being held accountable for decisions
- Economically valuable software is precisely the kind that diverges from standard patterns, which is where LLMs struggle most
- Attempts to create more precise, less ambiguous specification languages for LLMs inevitably converge on reinventing programming languages, validating the article's thesis
- Even heavy daily users of AI coding tools report catching significant errors, requirement misinterpretations, and ignored instructions regularly
Opposed
- LLMs can and do fill in reasonable implementation details from terse descriptions for well-known patterns, which constitutes genuine value even if imperfect
- Most commercial software is a permutation of existing standard components, which LLMs handle well enough for practical purposes
- Using Haskell as the test case is unfair since its paradigms are radically different from mainstream languages and it has limited training data
- Critics of AI coding are engaging in anxious rationalization driven by fear of job displacement rather than objective technical assessment
- Spec-driven approaches with waterfall-style design docs do work effectively for maintaining functionality across many agentic iterations