Claude Code deploys Haiku, with 36% of its subagents now running on the platform
Photo by Maxim Hopman on Unsplash
Claude Code has deployed its Haiku platform, with reports indicating that 36% of its subagents are now running on Haiku after a brief outage on Opus prompted a switch to Sonnet and the activation of an OpenTelemetry pipeline.
Quick Summary
- •Claude Code has deployed its Haiku platform, with reports indicating that 36% of its subagents are now running on Haiku after a brief outage on Opus prompted a switch to Sonnet and the activation of an OpenTelemetry pipeline.
- •Key company: Claude Code
- •Also mentioned: Claude Code
Claude Code’s rollout of Haiku has quickly become the default engine for its internal research subagents, a shift that surfaced only after an Opus outage forced a temporary switch to Sonnet. In a detailed telemetry dump posted by developer Constantine Mirin on mirin.pro, the dashboard revealed that roughly 95 % of API calls were being routed to Haiku, even though the user’s global settings listed Sonnet as the preferred model for all projects. The data shows 7,222 API calls across five codebases, consuming 2.55 million input tokens and 2.38 million output tokens, while the cache read count ballooned to 451 million tokens—a pattern that translates to an estimated $354 in usage cost, a fraction of what Opus would have charged (Mirin).
The breakdown by agent underscores Haiku’s dominance. The “salesagent” subagent logged 1,727 calls to Haiku versus 1,657 to Opus and 1,377 to Sonnet, putting Haiku at 36 % of that agent’s traffic. Similar ratios appeared in the “pb‑cards” (41 % Haiku), “adcp‑req” (32 % Haidu), and “imagefactory‑v2” (39 % Haiku) projects, with one project even registering more Haiku calls than either of the explicitly selected models. This suggests that Claude Code’s architecture silently delegates the exploratory phase of code‑generation—searching files, classifying text, and other lightweight tasks—to Haiku, regardless of the model declared in the user’s configuration (Mirin).
According to Mirin’s analysis, the hidden routing stems from Claude Code’s “Explore” subagent, a built‑in component hard‑coded to Haiku. Each exploratory request is a stateless call that consistently returns exactly 32 output tokens, consumes between 300 and 2,000 input tokens, and never hits the cache. Because these calls are generated internally, they do not appear in the UI or in the platform’s /stats endpoint, making external telemetry the only way to surface them. The author traced the pattern by inspecting logs that showed a steady stream of these 32‑token responses during active coding sessions, confirming that Haiku handles the bulk of the research workload while Sonnet or Opus are reserved for the final synthesis step (Mirin).
The cost implications are stark. Haiku is billed at roughly one‑twentieth the price of Opus per token, positioning it as the most economical choice for high‑frequency, low‑complexity operations. By offloading the majority of subagent calls to Haiku, Claude Code can keep overall expenses low while still leveraging the more powerful Sonnet or Opus models for deep reasoning tasks such as architectural design or complex refactoring. Mirin’s telemetry indicates that this hybrid approach already yields a modest $354 bill for over seven thousand calls—a figure that would have been substantially higher if Opus had handled the same volume (Mirin).
The discovery also raises questions about transparency. Users who configure Claude Code to run exclusively on Sonnet or Opus may be unaware that a substantial portion of their workload is being executed on a different model without explicit consent or visibility. While the practice improves speed and reduces cost, it sidesteps the platform’s stated model‑selection mechanisms, potentially complicating budgeting and compliance for enterprises that track AI usage at the model level. As Claude Code continues to scale, developers and organizations will likely demand clearer reporting tools that surface subagent activity alongside primary model usage, a need highlighted by Mirin’s reliance on an external OpenTelemetry pipeline to uncover the hidden traffic.
Sources
No primary source found (coverage-based)
- Dev.to AI Tag
This article was created using AI technology and reviewed by the SectorHQ editorial team for accuracy and quality.