LLMs Are Lossy Encyclopedias: Give Them Facts to Work With

Willison likens LLMs to lossy encyclopedias that compress knowledge and lose detail. They’re effective for reasoning and transforming inputs, but unreliable for precise, configuration-level facts. When specificity matters, give the model accurate examples and let it work from those.

Key Points

LLMs function like lossy encyclopedias: they compress knowledge and lose fine-grained details.
They work best for general reasoning and transformation, not for exact, highly specific facts.
Asking for a fully correct, detailed project skeleton is effectively a lossless encyclopedia request.
Provide accurate examples or documentation, then let the model act on those facts.
Treat LLMs as tools that transform provided inputs rather than authoritative sources of precise configurations.

Sentiment

Mixed but leaning supportive: most agree with the core caution that ungrounded LLMs are unreliable for precise facts and should be paired with sources and verification, while some push back on the analogy itself and argue the tool is powerful if used correctly.

In Agreement

LLMs compress vast knowledge but drop detail; they’re unreliable for ultra‑specific, exact facts without grounding.
Best uses: summarization, transformation, reasoning over provided materials, scaffolding code, and brainstorming—always with verification.
Provide authoritative examples/sources and let the model operate on them (RAG/agents/search) rather than expecting it to recall obscure configurations.
Confidence calibration is weak; models can be confidently wrong and sensitive to leading prompts; users need intuition about when details matter.
Treat LLMs like junior developers or unreliable librarians: useful with supervision; don’t use them ungrounded for safety‑critical tasks like dosing.
Newer models sometimes ask clarifying questions or say “I don’t know,” but users still must check references and confirm results.

Opposed

The “lossy encyclopedia” analogy is flawed: lossy compression artifacts aren’t hallucinations; encyclopedias are already lossy; better analogies include unreliable librarian, actor, UI for search, or approximate database.
Don’t “fix” the tool to refuse answers; users should learn proper use—documentation and training suffice; filtering should be optional.
LLMs can outperform humans in some tasks and improve with search/citations; with grounding they can be very accurate and useful even in complex domains.
LLMs are more than encyclopedias: they generate novel content, translate, plan, and reason; the analogy understates their capabilities.
Hallucinations and error compounding are improving with better reasoning training; determinism/seed and prompting strategies can mitigate failures.