Stop Building Multi-Agents: Context Engineering for Reliable LLM Agents

Read Articleadded Sep 2, 2025
Stop Building Multi-Agents: Context Engineering for Reliable LLM Agents

Multi-agent LLM systems are fragile because they disperse decisions and fail to share full context. Walden Yan proposes two principles—share full traces and treat actions as implicit decisions—and recommends single-threaded agents for reliability. To handle long tasks, add a strong summarization layer rather than parallel subagents.

Key Points

  • Principle 1: Share full context and traces—partial messages are insufficient for reliable decisions.
  • Principle 2: Actions encode implicit decisions; conflicting actions from poorly aligned agents produce bad results.
  • Prefer single-threaded, linear agents for reliability; add summarization/compression to handle long contexts.
  • Multi-agent architectures are currently fragile because context and decisions cannot be shared robustly across agents.
  • Real-world patterns (e.g., Claude Code subagents, the move away from edit-apply splits) reinforce keeping decision-making unified.

Sentiment

Mixed but leaning supportive: many agree single-agent plus rigorous context engineering is most reliable today, while a sizable minority argues multi-agent can work with strong constraints, orchestration, and isolation.

In Agreement

  • Single-agent architectures with carefully engineered, linear context are more reliable than free-form multi-agent systems today.
  • Subagents are useful only as bounded tools (e.g., research/Q&A) to avoid polluting the main context; keep their deliberation out of the primary trace.
  • Context management is the core problem: dilution happens well before the window fills, so summarization/compression and selective inclusion are essential.
  • Pipelines beat freely communicating agents for many tasks; merging decision-making and execution in one step reduces brittleness.
  • Structured/constrained generation and restricted outputs improve reliability and mitigate prompt injection risk.
  • Empirical evidence: around 50k tokens, agents drift from goals; single-agent with good prompting often outperforms multi-agent orchestration.
  • Human-in-the-loop oversight remains important; unchecked autonomous multi-agents are error-prone.

Opposed

  • Separate agents for distinct rule sets (e.g., code editing vs. code search) reduce instruction conflicts; role-specific agents can outperform a single mixed-context agent.
  • Critic/reviewer subagents should not inherit context to remain unbiased; inheriting full traces can be harmful.
  • Parallel subagents can work when isolated by workspace boundaries and guarded by human review or orchestration, enabling faster progress on disjoint components.
  • The term “agent” is often just a prompt template; better to conceptualize systems as tools with clear interfaces and asynchronous execution rather than forbidding multi-agent designs.
  • Build better orchestration: agents emit intents to queues; executors act deterministically; shared knowledge stores and context optimizers can make multi-agent designs viable.
  • Less context can be more; selective omission and first-impression tools can outperform full-trace sharing, contradicting a blanket call to share everything.
  • We’re early; don’t follow dogma—experiment and do what works for your use case, as best practices are still evolving.
Stop Building Multi-Agents: Context Engineering for Reliable LLM Agents