Claude Code Tokens Are Traced to Their Exact Destinations, Revealing Full Flow Paths
Photo by Compare Fibre on Unsplash
200,000 tokens. That’s the context budget Claude Code users pay for, yet our tracing shows only about 25% actually perform work, with the remaining tokens falling into three separate, non‑productive categories.
Key Facts
- •Key company: Claude Code
Claude Code’s token accounting is more opaque than most developers realize, and the new tracing work from Slim’s “ClaudeTUI” toolset pulls back the curtain on exactly how the 200 k‑token context budget is consumed. By parsing raw JSONL transcripts from hundreds of real‑world sessions, the analysis identified four distinct token buckets, but only one—user‑generated content—actually contributes to productive work. The other three—system prompt, compaction summaries, and cached reads—are essentially overhead that still counts against the context window while inflating the apparent cost of a session (Slim, Mar 14).
The most stubborn of these overheads is the immutable system prompt. Every API call to Claude Code includes a roughly 14,328‑token block that encodes the model’s internal instructions, tool definitions, safety guidelines, and the user’s CLAUDE.md file. This “tax” is baked into the request payload and cannot be trimmed, meaning that of the 200 k token limit, only about 186 k are ever available for the evolving conversation (Slim). The prompt reappears on each turn, resetting the cache‑read counter to exactly that figure after any compaction event, confirming its role as a fixed floor in the token ledger.
Compaction introduces a second, variable overhead. When a session’s token count approaches the 200 k ceiling, Claude Code automatically generates a summary of the entire dialogue, replacing the raw history with a condensed representation. These summaries range from 11 k to 19 k tokens, depending on session length and content density, and they must be re‑processed from scratch each time a new compaction fires (Slim). Because the summary itself is subject to the same system‑prompt inclusion, each compaction effectively resets the token budget, forcing the model to re‑ingest a large chunk of cached data and incurring the full input price for that segment.
Anthropic’s server‑side prompt caching mitigates the financial impact of repeatedly sending the same conversation history, but it does not reduce the token count that the client must allocate. The cache works by recognizing exact token prefixes that have been seen recently and serving them at a discounted read rate of $1.50 per million tokens versus $15 per million for fresh “cache‑creation” processing (Slim). In a 157‑turn session, the study measured that 98 % of tokens were served from the cache, confirming the efficiency of the mechanism. However, the cache’s short time‑to‑live—estimated at around five minutes—means that any pause longer than that forces the next request to pay the full input price for the entire history, erasing the cost savings accrued earlier (Slim).
The final category, the user’s own input, accounts for roughly a quarter of the total token flow. Every keystroke that triggers an “Enter” sends the entire conversation, including the system prompt and all prior turns, to the Claude API, which is stateless and has no memory of earlier messages (Slim). Consequently, each new turn adds more tokens to the request payload, and the session’s latency and expense grow progressively. The compaction process, while intended to keep the session within the 200 k limit, actually incurs a hidden cost: it discards the cached conversation and forces a fresh cache‑creation pass for the newly generated summary, temporarily inflating both token usage and monetary charge (Slim).
For enterprises that rely on Claude Code for code generation or other productivity tasks, these findings have practical implications. The fixed system‑prompt overhead and periodic compaction summaries mean that the effective usable context shrinks as sessions lengthen, potentially requiring more frequent session resets or strategic prompting to stay within budget. Moreover, the reliance on short‑lived caching underscores the importance of maintaining a steady interaction cadence; otherwise, the cost advantage of cached reads evaporates. As Anthropic tightens enforcement against unauthorized third‑party harnesses (VentureBeat) and clarifies tool‑access policies (The Register), developers may see tighter controls that further emphasize efficient token management. Understanding the exact flow of tokens—now mapped in detail by Slim’s tracing—allows teams to optimize their Claude Code usage, balance session length against cost, and avoid the hidden “taxes” that silently erode productivity budgets.
Sources
No primary source found (coverage-based)
- Dev.to AI Tag
Reporting based on verified sources and public filings. Sector HQ editorial standards require multi-source attribution.