cc-canary: Local Drift Detection for Claude Code

cc-canary is a local drift detection tool for Claude Code that analyzes session logs to identify performance regressions. It generates comprehensive Markdown or HTML reports featuring metrics like reasoning depth, cost trends, and tool-use efficiency. The tool is designed for privacy, operating entirely offline and redacting sensitive user data during the reporting process.

Key Points

Detects performance regressions in Claude Code by analyzing local JSONL session logs.
Generates detailed forensic reports with verdicts such as 'Confirmed Regression' or 'Holding'.
Tracks specific behavioral metrics including Read:Edit ratios, reasoning loops, and frustration rates.
Prioritizes privacy by running entirely offline with no telemetry or external data transmission.
Identifies specific 'inflection dates' where model behavior significantly deviated from historical norms.

Sentiment

The community is largely skeptical of cc-canary's approach, with the dominant sentiment being that self-monitoring an AI with AI is circular reasoning. While commenters acknowledge that drift and regressions are real phenomena, most question whether this particular tool can meaningfully detect them given uncontrolled variables. There is also a broader undercurrent of frustration with AI coding tools requiring constant babysitting.

In Agreement

Drift is real and there are many opaque ways providers can change model behavior — through system prompts, compute allocation, model swaps, and harness changes — making independent monitoring valuable
The tool addresses a genuine need for solo developers who find running formal evals too expensive but still want to track whether prompt or skill changes improve or degrade performance
The unconventional local-only, privacy-first approach may reveal issues that standard benchmarks miss

Opposed

Using the same black box to analyze itself is fundamentally unreliable — an LLM told to find regressions will tend to find them even where none exist
The methodology fails to control for confounding variables like codebase growth, task complexity, and project evolution, making any detected 'regression' potentially meaningless
The tool's assessment categories are biased toward finding problems (no option for 'better than baseline'), and metrics like Read:Edit ratios naturally change as codebases grow
If you need a canary to monitor your AI coding tool, that's a signal the tool isn't trustworthy enough to rely on — better to find a tool you trust
The entire drift detection concept requires static targets and repeated attempts to be scientifically valid, which this tool doesn't provide