Code Over Prose: The Case for Deterministic AI Agents
The author contends that prompt engineering is an insufficient foundation for complex AI agents because it lacks the determinism of traditional software. Instead, agents should be built using deterministic control flows and validation checkpoints that treat the LLM as a modular component. This transition from prose-based logic to programmatic orchestration is essential to prevent silent failures and ensure scalability.
Key Points
- Prompting is inherently non-deterministic and lacks the predictable behavior required for complex software scaling.
- Reliable agents require deterministic scaffolds and validation checkpoints that treat the LLM as a single component of a larger system.
- Software engineering principles like recursive composability are necessary to manage the complexity of agentic reasoning.
- Aggressive programmatic error detection is vital to prevent agents from reaching incorrect conclusions through silent failures.
- Current non-deterministic approaches force a reliance on human 'babysitters' or post-run auditors, which limits scalability.
Sentiment
The community overwhelmingly agrees with the article's core thesis. Practitioners share extensive real-world evidence that deterministic harnesses dramatically improve agent reliability. The few dissenting voices generally agree with the underlying problem but propose alternative solutions like multi-agent architectures or better task decomposition rather than defending prompt-only approaches. There is a strong undercurrent of frustration with AI labs pushing prompt-centric workflows for commercial reasons.
In Agreement
- Practitioners report dramatic reliability improvements when wrapping LLM agents in deterministic harnesses with explicit loops and validation checkpoints
- Smaller, cheaper models can match frontier model performance when tasks are decomposed into focused units within deterministic control flow
- AI labs have a financial incentive to push prompt-heavy workflows over harness-based approaches because they sell tokens
- Using MANDATORY or DO NOT SKIP in prompts is a code smell indicating logic that should be moved into code
- The breakthrough in AI coding came from moving process execution into the harness rather than from intelligence improvements alone
- Quality gates and deterministic validation make LLM output reliable in the same way pre-commit hooks and TDD constrain human developers
- Prompts are suggestions, not instructions — LLMs fundamentally cannot guarantee compliance with prompt directives
Opposed
- Multi-agent architectures with supervisor, orchestrator, and worker agents can achieve reliability without rigid deterministic control flow
- The problem is overloaded agent context, not non-determinism — separating concerns across focused agents with limited context resolves most issues
- Thinking about agents declaratively rather than imperatively is more productive; people are trying to use the wrong tool for deterministic tasks
- Too much deterministic control flow makes systems rigid and fragile with edge cases; some flexibility is needed for dynamic situations
- LLM errors can be quantified as stochastic processes, making each use case a risk-reward tradeoff rather than an absolute reliability requirement