
The Hidden 30% Tax in Claude 4.7
707
Claude 4.7 uses significantly more tokens for the same text, increasing session costs by ~30% in exchange for better instruction following.
How text is split into tokens for language models, including tokenizer design, vocabulary construction, byte-pair encoding, and the downstream effects of tokenization choices on model behavior and output.

Claude 4.7 uses significantly more tokens for the same text, increasing session costs by ~30% in exchange for better instruction following.
Models compose “seahorse + emoji,” but with no matching token the unembedding snaps to a nearby emoji, causing confident errors and occasional feedback loops.