LLMs Are Lossy Encyclopedias: Give Them Facts to Work With
Willison likens LLMs to lossy encyclopedias that compress knowledge and lose detail. They’re effective for reasoning and transforming inputs, but unreliable for precise, configuration-level facts. When specificity matters, give the model accurate examples and let it work from those.
Key Points
- LLMs function like lossy encyclopedias: they compress knowledge and lose fine-grained details.
- They work best for general reasoning and transformation, not for exact, highly specific facts.
- Asking for a fully correct, detailed project skeleton is effectively a lossless encyclopedia request.
- Provide accurate examples or documentation, then let the model act on those facts.
- Treat LLMs as tools that transform provided inputs rather than authoritative sources of precise configurations.
Sentiment
The community broadly agrees with the article's practical advice — don't trust LLMs for precise factual recall, provide them with facts to work with. However, there's substantial constructive debate about whether "lossy encyclopedia" is the right metaphor, with many finding it useful but technically misleading. A notable tension runs between those who believe users should adapt to LLM limitations and those who believe AI companies should engineer better safety rails.
In Agreement
- Users need domain knowledge to evaluate LLM output, and developing intuition about what questions LLMs handle well vs. poorly is essential
- Providing correct context and source material makes LLMs dramatically more effective, validating the article's core practical advice
- LLMs combined with search tools and RAG largely solve the factual accuracy problem, supporting the article's framework
- The Gell-Mann amnesia effect applies to LLM interactions — people trust outputs in unfamiliar domains while spotting errors in their areas of expertise
- LLMs are useful tools that require skill to use effectively, similar to other professional instruments
Opposed
- "Lossy" is misleading because traditional lossy compression degrades predictably while LLMs fabricate plausible falsehoods with full confidence
- The analogy puts too much responsibility on users when AI companies should be engineering better guardrails and "I don't know" responses
- Alternative analogies like "unreliable librarian" or "plausibility simulator" better capture the failure mode of generating confident wrong answers
- The article states something that should be obvious to anyone familiar with neural network fundamentals
- The framing still implicitly endorses using LLMs for factual queries when the real solution is better tool design