Stop Building Multi-Agents: Context Engineering for Reliable LLM Agents
Read ArticleRead Original Articleadded Sep 2, 2025September 2, 2025

Multi-agent LLM systems are fragile because they disperse decisions and fail to share full context. Walden Yan proposes two principles—share full traces and treat actions as implicit decisions—and recommends single-threaded agents for reliability. To handle long tasks, add a strong summarization layer rather than parallel subagents.
Key Points
- Principle 1: Share full context and traces—partial messages are insufficient for reliable decisions.
- Principle 2: Actions encode implicit decisions; conflicting actions from poorly aligned agents produce bad results.
- Prefer single-threaded, linear agents for reliability; add summarization/compression to handle long contexts.
- Multi-agent architectures are currently fragile because context and decisions cannot be shared robustly across agents.
- Real-world patterns (e.g., Claude Code subagents, the move away from edit-apply splits) reinforce keeping decision-making unified.
Sentiment
Mixed but leaning supportive: many agree single-agent plus rigorous context engineering is most reliable today, while a sizable minority argues multi-agent can work with strong constraints, orchestration, and isolation.
In Agreement
- Single-agent architectures with carefully engineered, linear context are more reliable than free-form multi-agent systems today.
- Subagents are useful only as bounded tools (e.g., research/Q&A) to avoid polluting the main context; keep their deliberation out of the primary trace.
- Context management is the core problem: dilution happens well before the window fills, so summarization/compression and selective inclusion are essential.
- Pipelines beat freely communicating agents for many tasks; merging decision-making and execution in one step reduces brittleness.
- Structured/constrained generation and restricted outputs improve reliability and mitigate prompt injection risk.
- Empirical evidence: around 50k tokens, agents drift from goals; single-agent with good prompting often outperforms multi-agent orchestration.
- Human-in-the-loop oversight remains important; unchecked autonomous multi-agents are error-prone.
Opposed
- Separate agents for distinct rule sets (e.g., code editing vs. code search) reduce instruction conflicts; role-specific agents can outperform a single mixed-context agent.
- Critic/reviewer subagents should not inherit context to remain unbiased; inheriting full traces can be harmful.
- Parallel subagents can work when isolated by workspace boundaries and guarded by human review or orchestration, enabling faster progress on disjoint components.
- The term “agent” is often just a prompt template; better to conceptualize systems as tools with clear interfaces and asynchronous execution rather than forbidding multi-agent designs.
- Build better orchestration: agents emit intents to queues; executors act deterministically; shared knowledge stores and context optimizers can make multi-agent designs viable.
- Less context can be more; selective omission and first-impression tools can outperform full-trace sharing, contradicting a blanket call to share everything.
- We’re early; don’t follow dogma—experiment and do what works for your use case, as best practices are still evolving.