GitHub Study Finds Claude Code Cache Bugs Inflate Token Usage by 10‑20×, Prompting Urgent
Photo by Possessed Photography on Unsplash
Developers expected Claude Code to streamline prompts, but a recent report shows cache bugs are blowing token usage by 10‑20×, turning cost‑saving promises into massive overruns.
Key Facts
- •Key company: Claude Code
The first‑turn cache miss is the silent culprit behind most of the bloat, according to the GitHub analysis of Claude Code’s telemetry. Even after the B1‑B2 cache bugs were patched in v2.1.91, 79 percent of new sessions still begin with a cache_read of 0 on the initial API call — a structural flaw that forces the model to resend the entire system prompt and skill set on every fresh conversation (“Full data” section). The problem stems from the way Claude Code now injects skills and CLAUDE.md into messages[0] instead of the system[] prefix, breaking the prefix‑based caching logic that earlier versions relied on. Community testing shows a modest improvement in later releases (≈29 % first‑turn hits in v2.1.104), but the penalty remains large enough to inflate token consumption by an order of magnitude.
Beyond the first‑turn miss, a cascade of unfixed client‑side bugs continues to gnaw at efficiency. The repository lists eleven confirmed issues (B1‑B5, B8, B8a, B9, B10, B11, B2a) and four preliminary findings (P1‑P4). While the cache‑related B1‑B2 bugs were addressed, nine bugs persist as of v2.1.101, the latest public release — including B8a’s non‑atomic JSONL writes that corrupt session files, B9’s branch‑context inflation that balloons token counts from 6 % to 73 %, and B10’s deprecation of TaskOutput which injects 21 times more context and can cause fatal errors (“Details” section). Anthropic has publicly acknowledged B11 (adaptive thinking with zero‑reasoning) on Hacker News, but no follow‑up has been posted, leaving developers to grapple with fabricated outputs that further waste tokens.
The “Output efficiency” system prompt, once a quiet lever for trimming verbosity (P3), vanished without fanfare. Scanning 353 local session JSONL files revealed that after April 10 the prompt text—“straight to the point / do not overdo”—disappeared entirely, suggesting a deliberate removal in the codebase (“P3 update”). Without that guardrail, Claude Code’s responses have drifted toward loquaciousness, compounding the token surge caused by the lingering bugs. The removal went unnoticed until a community member, @wjordan, diffed system‑prompt archives and flagged the change, highlighting how subtle internal tweaks can ripple into massive cost overruns for users.
Proxy data collected over 13 days (27,708 requests) paints a stark picture of fallback behavior. Every single request recorded a fallback‑percentage of 0.5, a metric that remains undocumented but appears to indicate that half of each call defaults to a backup path, effectively doubling token usage. Community researchers @cnighswonger (11,502 calls, max 5× US) and @0xNightDev (max 5× EU) reported the same 0.5 figure, though the latter saw request rejections while the former’s calls were allowed, hinting at organization‑level flag differences that are not publicly explained (“Full data”). The consistency of the 0.5 fallback across diverse accounts suggests a systemic issue rather than isolated misconfigurations.
Taken together, the data tells a cautionary tale: Claude Code’s promise of cost‑effective, code‑centric prompting is being undercut by a perfect storm of caching mishaps, lingering bugs, and silent feature removals. Developers who adopted the tool expecting modest token footprints are now facing 10‑20× overruns, a reality that forces a reassessment of budgeting and architecture choices. Anthropic’s incremental fixes—such as the B2a resume‑path patch in v2.1.101—offer a glimpse of progress, but the roadmap remains opaque, and the community is left to patch, monitor, and document workarounds until a more stable release lands.
Sources
Reporting based on verified sources and public filings. Sector HQ editorial standards require multi-source attribution.