Stop Building Multi-Agents: Context Engineering for Reliable LLM Agents

Multi-agent LLM systems are fragile because they disperse decisions and fail to share full context. Walden Yan proposes two principles—share full traces and treat actions as implicit decisions—and recommends single-threaded agents for reliability. To handle long tasks, add a strong summarization layer rather than parallel subagents.

Key Points

Principle 1: Share full context and traces—partial messages are insufficient for reliable decisions.
Principle 2: Actions encode implicit decisions; conflicting actions from poorly aligned agents produce bad results.
Prefer single-threaded, linear agents for reliability; add summarization/compression to handle long contexts.
Multi-agent architectures are currently fragile because context and decisions cannot be shared robustly across agents.
Real-world patterns (e.g., Claude Code subagents, the move away from edit-apply splits) reinforce keeping decision-making unified.

Sentiment

Mixed but leaning supportive: many agree single-agent plus rigorous context engineering is most reliable today, while a sizable minority argues multi-agent can work with strong constraints, orchestration, and isolation.

In Agreement

Single-agent architectures with carefully engineered, linear context are more reliable than free-form multi-agent systems today.
Subagents are useful only as bounded tools (e.g., research/Q&A) to avoid polluting the main context; keep their deliberation out of the primary trace.
Context management is the core problem: dilution happens well before the window fills, so summarization/compression and selective inclusion are essential.
Pipelines beat freely communicating agents for many tasks; merging decision-making and execution in one step reduces brittleness.
Structured/constrained generation and restricted outputs improve reliability and mitigate prompt injection risk.
Empirical evidence: around 50k tokens, agents drift from goals; single-agent with good prompting often outperforms multi-agent orchestration.
Human-in-the-loop oversight remains important; unchecked autonomous multi-agents are error-prone.

Opposed

Separate agents for distinct rule sets (e.g., code editing vs. code search) reduce instruction conflicts; role-specific agents can outperform a single mixed-context agent.
Critic/reviewer subagents should not inherit context to remain unbiased; inheriting full traces can be harmful.
Parallel subagents can work when isolated by workspace boundaries and guarded by human review or orchestration, enabling faster progress on disjoint components.
The term “agent” is often just a prompt template; better to conceptualize systems as tools with clear interfaces and asynchronous execution rather than forbidding multi-agent designs.
Build better orchestration: agents emit intents to queues; executors act deterministically; shared knowledge stores and context optimizers can make multi-agent designs viable.
Less context can be more; selective omission and first-impression tools can outperform full-trace sharing, contradicting a blanket call to share everything.
We’re early; don’t follow dogma—experiment and do what works for your use case, as best practices are still evolving.