Letta Code: Stateful Coding Agents That Learn and Lead on Terminal-Bench

Letta Code turns coding agents into persistent teammates that learn from your codebase, actions, and feedback. It supports memory initialization, explicit reflection, and reusable skills stored as simple .md files, with powerful search over past work. It also ranks #1 among model-agnostic OSS harnesses on Terminal-Bench, rivaling provider-specific tools.
Key Points
- Letta Code creates long-lived, stateful coding agents that learn from past interactions and project context.
- /init bootstraps memory by analyzing the local codebase and rewriting the agent’s system prompt via memory blocks; /remember triggers explicit reflection.
- Repeated tasks can be captured as reusable, versioned skills (.md files) that agents can share and reuse.
- Agents can search persisted conversations and tools via Letta’s API with vector, full-text, and hybrid search.
- Letta Code is the #1 model-agnostic OSS harness on Terminal-Bench, matching provider-specific harness performance and surpassing prior OSS baselines.
Sentiment
The community is curious and moderately supportive, with genuine interest in the memory-first coding agent concept. The most engaged commenters express healthy skepticism — particularly around context poisoning and whether memory adds value over well-structured docs — but are largely won over by Letta's transparent, auditable approach. The Letta co-founder's responsive engagement throughout the thread helps address concerns in real time, and the overall tone is constructive rather than dismissive.
In Agreement
- Letta's white-box, transparent memory approach is fundamentally different and better than opaque systems like ChatGPT memory, since users can inspect and edit every token injected into prompts.
- Long-term memory genuinely helps capture 'tribal knowledge' that wouldn't naturally end up in documentation files, like learned agent behaviors and cross-project preferences.
- The ability to perform CRUD operations on memory blocks directly addresses context poisoning concerns by giving users full control over what persists.
- Terminal-Bench performance demonstrates that the harness is competitive even without the memory features, validating it as a strong model-agnostic open-source coding tool.
- Skill learning — distilling repeated workflows into reusable, versioned files — is a compelling use case that goes beyond what static docs provide.
Opposed
- Well-maintained project documentation files (CLAUDE.md, llm.md) already solve most cross-session context problems without the complexity and risks of automated memory systems.
- ChatGPT's memory is a cautionary tale: it quickly fills with irrelevant or incorrect information, causing the model to respond from a polluted 'this is correct for this person' perspective.
- Context poisoning from accumulated stale memories is a real and serious problem that memory providers tend to exacerbate rather than solve.
- Long-term memory could cause agents to become 'stuck in their ways' like humans, resisting corrections or applying past learned behavior to inappropriate new situations.
- Terminal-Bench is CLI-focused and Cursor's absence from the leaderboard limits its usefulness as a benchmark for comparing IDE-centric coding tools.