The LLM-Wiki: Building Compounding Knowledge Bases

Andrej Karpathy suggests moving beyond simple RAG systems by having LLMs build and maintain a persistent, interlinked markdown wiki. This system uses a three-layer architecture—raw sources, the wiki, and a schema—to ensure knowledge compounds over time through incremental updates and automated health checks. By automating the tedious bookkeeping of cross-referencing and summarizing, the LLM allows users to focus on high-level curation and exploration.

Key Points

Standard RAG is inefficient because it requires the LLM to re-derive synthesis from raw fragments for every single query.
The LLM-Wiki pattern creates a persistent, compounding artifact where the LLM handles all cross-referencing, summarizing, and filing.
The system architecture relies on a three-layer stack: raw immutable sources, the generated markdown wiki, and a schema document for instructions.
Automated 'linting' allows the LLM to periodically check the knowledge base for contradictions, stale claims, and missing links.
This approach realizes Vannevar Bush's 'Memex' vision by delegating the tedious bookkeeping of knowledge maintenance to AI.

Sentiment

The community is notably skeptical. While many acknowledge the idea is interesting and some have built similar systems themselves, the dominant sentiment questions whether this represents meaningful innovation beyond existing RAG and memory patterns. The strongest pushback centers on the philosophical concern that outsourcing knowledge organization to AI removes the cognitive benefits humans gain from doing that work themselves. Karpathy's response of posting an AI-generated retort (later deleted or flagged) further fueled skepticism about the approach.

In Agreement

The write loop where the LLM maintains, cross-references, and synthesizes wiki content goes beyond standard RAG and represents genuine knowledge synthesis rather than simple retrieval
The linting pass that audits for inconsistencies and suggests connections is a genuinely novel and valuable idea, similar to maintaining a zettelkasten
The wiki serves as a dual interface readable by both humans (via Obsidian) and AI, which is the key architectural insight — unlike vector databases, you can browse and understand the knowledge directly
Source granularity matters enormously — splitting content into chapter-level files rather than monolithic documents categorically changes the quality of wiki output
The approach echoes Licklider's 1960 vision of man-computer symbiosis, where humans set goals and supply motivations while machines handle routinizable operations
Having all project knowledge live in the repo as markdown provides both human and agent accessibility with established conventions

Opposed

This is fundamentally just RAG repackaged — the core problem of retrieving relevant information for LLM context remains the same whether using vector databases or structured filesystems
The 'grunt work' of manually organizing a knowledge base is where genuine insights and learning happen — delegating this to AI eliminates the cognitive value of the process itself
LLM-generated wikis will accumulate subtle errors as second-order information, and users won't have time to fact-check everything, making it better to source original documents every time
Existing tools already do this — Claude memory, GitHub Copilot instruction files, AGENTS.md, and ChatGPT chat memory all implement similar persistent knowledge patterns
Next-generation models with larger context windows and faster throughput will make this intermediate layer obsolete, and custom abstractions built now will be irrelevant within months
A self-referential autonomous knowledge layer is valueless because it only makes itself more efficient — the real value comes from systems that support humans declaring how things should actually behave
Delegating architecture and discovery to LLM-managed wikis creates a new form of tech debt and 'AI de-skilling' where developers develop persistent knowledge gaps mirroring the agent's limitations