Skip to main content
Block

Block Launches LLM Router Benchmark Testing 46 Models Across 8 Providers in Sub‑1 ms

Published by
SectorHQ Editorial
Block Launches LLM Router Benchmark Testing 46 Models Across 8 Providers in Sub‑1 ms

Photo by Solen Feyissa (unsplash.com/@solenfeyissa) on Unsplash

46 LLM models from eight providers are now routed in under 1 ms, a benchmark that shows speed and intelligence are poorly correlated, according to a recent report on Block’s routing system.

Key Facts

  • Key company: Block

Block’s new routing system, dubbed ClawRouter v0.12.47, demonstrates that raw latency is a poor proxy for model capability. In a production‑grade benchmark that routed every request through BlockRun’s x402 micropayment gateway—adding a mandatory 50‑100 ms verification step—the team measured end‑to‑end wall‑clock times for 46 large‑language‑model (LLM) endpoints across eight providers (OpenAI, Anthropic, Google, xAI, DeepSeek, Moonshot, MiniMax, NVIDIA, Z.AI). The results, posted on blockrun.ai on March 21, show a seven‑fold spread between the fastest and slowest models, with the quickest—Google’s Gemini‑3.1‑pro at 1,609 ms—still taking more than a second after payment overhead, while OpenAI’s GPT‑5‑series lingered between 3.5 s and 8 s (source: BlockRun benchmark report).

Speed alone, however, does not dictate suitability for a given workload. The report cross‑referenced latency with the Artificial Analysis Intelligence Index v4.0, a composite score aggregating GPQA, MMLU, MATH, HumanEval and other academic benchmarks. Models that sat near the “sweet spot” of low latency, high IQ and reasonable cost per million tokens included Google’s Gemini‑3.1‑pro (57 IQ, $2.00/M) and Gemini‑3‑flash‑prev (46 IQ, $0.50/M). By contrast, OpenAI’s flagship GPT‑5.4 matched Gemini’s IQ (57) but incurred a $2.50/M price tag and a 6.2‑second response time, while its cheaper GPT‑4.1‑nano, despite a $0.10/M cost, was still twice as slow as Google’s cheapest offering. Anthropic’s Claude‑opus‑4.6 and Claude‑sonnet‑4.6 delivered mid‑range IQ scores (53‑52) with latencies around 2.1 s, positioning them as viable alternatives for reasoning‑heavy tasks that demand more than raw speed (source: BlockRun benchmark report).

To reconcile these divergent dimensions, BlockRun built a production router that classifies incoming requests in under one millisecond using 14 weighted features—such as token count, temperature, and expected reasoning depth—combined with sigmoid confidence calibration. The router’s decision matrix evaluates each request against the benchmarked performance‑cost‑quality profile of every model, then selects the optimal endpoint in real time. This approach allows users to set “model: auto” and receive a recommendation that balances cost, latency, and intelligence rather than defaulting to the cheapest or fastest model. According to the report, the router’s sub‑millisecond classification overhead is negligible compared to the 50‑100 ms payment verification delay, effectively eliminating the “infinite wrong choices” problem that plagued earlier single‑gateway implementations (source: BlockRun benchmark report).

Industry observers have noted that BlockRun’s methodology underscores a broader shift toward multi‑model orchestration. As Tom’s Hardware highlighted in its March 2026 coverage, the proliferation of specialized LLMs means that developers can no longer rely on a one‑size‑fits‑all provider; instead, they must dynamically match workloads to the most appropriate model (source: Tom’s Hardware archive). BlockRun’s benchmark also confirms the dominance of Google and xAI in the speed category—11 of the top 13 fastest models hail from those two firms—while OpenAI’s flagship offerings lag behind despite their brand cachet. This performance gap may pressure OpenAI to optimize its inference pipelines or adjust pricing to remain competitive in latency‑sensitive applications such as real‑time code generation or interactive agents (source: BlockRun benchmark report).

The practical implications for enterprises are significant. By integrating a sub‑millisecond router that accounts for 14 nuanced dimensions, BlockRun enables cost‑effective scaling of AI‑driven services without sacrificing the quality needed for complex tasks like theorem proving or concurrent data‑structure implementation. As the report notes, a “what is Python?” query can be directed to the cheapest, fastest model, whereas a request to “implement a B‑tree with concurrent insertions” will be routed to a higher‑IQ endpoint, ensuring that the system delivers both efficiency and competence. This granular routing capability could become a differentiator for platforms that charge per‑request fees, especially in markets where on‑chain micropayment verification adds unavoidable latency (source: BlockRun benchmark report).

Sources

Primary source

No primary source found (coverage-based)

Other signals
  • Dev.to AI Tag

Reporting based on verified sources and public filings. Sector HQ editorial standards require multi-source attribution.

More from SectorHQ:📊Intelligence📝Blog

🏢Companies in This Story

Related Stories