Slash Claude API Costs with Automated Prompt Caching

This open-source plugin automates prompt caching for Anthropic's Claude models, slashing token costs by up to 90%. It offers specialized modes for debugging and refactoring while providing detailed metrics on cache performance and savings. It is an essential tool for developers building custom agents who need to optimize their SDK usage and monitor cache hit rates.

Key Points

Reduces token costs by up to 90% by leveraging Anthropic's 0.1x cost for cache reads.
Features four specialized modes—BugFix, Refactor, File Tracking, and Conversation Freeze—to optimize different coding workflows.
Provides observability tools that track hit rates and cumulative savings, which are not natively exposed in standard clients.
Designed for developers building custom applications with the Anthropic SDK rather than just standard Claude Code users.
Compatible with any MCP-compliant client such as Cursor, Windsurf, and Zed via a simple npm installation.

Sentiment

The community is strongly dismissive. There is near-consensus that this plugin solves a non-problem, since Anthropic's native auto-caching already handles what the plugin claims to automate. The few defenders point to observability features and the API-vs-Claude-Code distinction, but these arguments gain little traction.

In Agreement

The plugin provides observability features like cache hit rates and savings visibility that the raw Anthropic API and Claude Code do not expose, which could still have value for developers building custom agents.
There is a legitimate distinction between Claude Code's built-in caching and the API-level caching this plugin targets — many commenters conflate the two, and developers making raw SDK calls may still benefit from automated breakpoint placement.

Opposed

Anthropic's auto-caching feature with cache_control ephemeral has already solved automatic breakpoint placement, making this plugin redundant — a fact the tool's own FAQ acknowledges.
An MCP server is the wrong abstraction for cache management because it requires an LLM to invoke a tool call before sending a request, adding unnecessary overhead at the wrong level.
The tool appears to be dead on arrival, with a domain registered just the day before the HN post, raising questions about the project's legitimacy and longevity.
The problem itself is trivial to solve — you just set your preferred cache level on the last message and Anthropic handles the rest automatically.