Caveman: Ultra-Efficient Token Compression for Claude Code

Caveman is a Claude Code skill that slashes token usage by 75% by adopting a primitive communication style. It removes conversational filler and pleasantries while preserving full technical accuracy and code integrity. This results in faster, cheaper AI interactions that focus strictly on essential information.

Key Points

Reduces output token usage by approximately 75% by eliminating linguistic fluff and filler.
Maintains 100% technical accuracy, keeping code blocks and complex technical terms intact.
Provides significant benefits in terms of cost savings and faster response generation speeds.
Easy to install and use within the Claude Code environment via simple slash commands or natural language triggers.
Removes non-essential elements like articles, pleasantries, and hedging while keeping git commits and PRs in a normal format.

Sentiment

The community was largely skeptical. While many appreciated the humor and creative concept, the dominant technical arguments centered on why compressed output could degrade model performance and why output tokens are not where the real cost lies. The author's graceful response acknowledging the need for benchmarks was well-received, but the overall consensus leaned toward caution about such approaches without proper evaluation.

In Agreement

Removing filler words and preamble from LLM output reduces noise without necessarily losing technical substance, since phrases like 'I'd be happy to help' carry minimal useful signal.
Caveman mode has unexpected personal value for human users — the stripped-down framing can provide clarity and cut through complexity, making problems easier to understand.
Research supports that concise prompting can reduce response length without always degrading quality, and the idea is a natural extension of how tokenization already compresses language.
In multi-turn conversations, output token savings compound and become more significant over time.
The concept parallels how Chinese language coding uses fewer tokens for similar results, suggesting there's room for more efficient LLM communication styles.

Opposed

Tokens are 'units of thinking' for LLMs — constraining output style forces the model to spend attention budget on how to say things rather than what to say, effectively reducing intelligence.
Output tokens are not the real bottleneck in agentic coding; input tokens from skills, directory trees, and tool outputs dominate usage, making output compression relatively insignificant.
Low-entropy filler tokens may serve as computational opportunities where the model does useful hidden-state computation during forward passes, so removing them could remove 'room to think.'
The 75% token savings claim has no benchmarks or rigorous evaluation — it's an empirical claim presented without evidence.
Thinking tokens are unaffected by the skill, and the model may need extra thinking tokens to rephrase into caveman style, potentially increasing total cost.
Training data associates caveman-style speech with non-technical contexts, so forcing this style likely biases the model away from rigorous technical output.