Code Over Prose: The Case for Deterministic AI Agents

The author contends that prompt engineering is an insufficient foundation for complex AI agents because it lacks the determinism of traditional software. Instead, agents should be built using deterministic control flows and validation checkpoints that treat the LLM as a modular component. This transition from prose-based logic to programmatic orchestration is essential to prevent silent failures and ensure scalability.

Key Points

Prompting is inherently non-deterministic and lacks the predictable behavior required for complex software scaling.
Reliable agents require deterministic scaffolds and validation checkpoints that treat the LLM as a single component of a larger system.
Software engineering principles like recursive composability are necessary to manage the complexity of agentic reasoning.
Aggressive programmatic error detection is vital to prevent agents from reaching incorrect conclusions through silent failures.
Current non-deterministic approaches force a reliance on human 'babysitters' or post-run auditors, which limits scalability.

Sentiment

The community overwhelmingly agrees with the article's core thesis. Practitioners share extensive real-world evidence that deterministic harnesses dramatically improve agent reliability. The few dissenting voices generally agree with the underlying problem but propose alternative solutions like multi-agent architectures or better task decomposition rather than defending prompt-only approaches. There is a strong undercurrent of frustration with AI labs pushing prompt-centric workflows for commercial reasons.

In Agreement

Practitioners report dramatic reliability improvements when wrapping LLM agents in deterministic harnesses with explicit loops and validation checkpoints
Smaller, cheaper models can match frontier model performance when tasks are decomposed into focused units within deterministic control flow
AI labs have a financial incentive to push prompt-heavy workflows over harness-based approaches because they sell tokens
Using MANDATORY or DO NOT SKIP in prompts is a code smell indicating logic that should be moved into code
The breakthrough in AI coding came from moving process execution into the harness rather than from intelligence improvements alone
Quality gates and deterministic validation make LLM output reliable in the same way pre-commit hooks and TDD constrain human developers
Prompts are suggestions, not instructions — LLMs fundamentally cannot guarantee compliance with prompt directives

Opposed

Multi-agent architectures with supervisor, orchestrator, and worker agents can achieve reliability without rigid deterministic control flow
The problem is overloaded agent context, not non-determinism — separating concerns across focused agents with limited context resolves most issues
Thinking about agents declaratively rather than imperatively is more productive; people are trying to use the wrong tool for deterministic tasks
Too much deterministic control flow makes systems rigid and fragile with edge cases; some flexibility is needed for dynamic situations
LLM errors can be quantified as stochastic processes, making each use case a risk-reward tradeoff rather than an absolute reliability requirement