Nvidia powers record‑fast DeepSeek‑R1 inference, boosting revenue 25× at 20× lower cost.
Photo by BoliviaInteligente (unsplash.com/@boliviainteligente) on Unsplash
Just weeks after the H100 set the benchmark, NVIDIA’s Blackwell chips now run DeepSeek‑R1 25 × faster in revenue and at 20 × lower cost per token, NVIDIA AI Twitter reports.
Quick Summary
- •Just weeks after the H100 set the benchmark, NVIDIA’s Blackwell chips now run DeepSeek‑R1 25 × faster in revenue and at 20 × lower cost per token, NVIDIA AI Twitter reports.
- •Key company: Nvidia
- •Also mentioned: DeepSeek
NVIDIA’s Blackwell architecture is now the reference point for large‑scale LLM inference, a shift that began with the H100’s record and accelerated dramatically in the past month. According to NVIDIA’s AI‑focused Twitter account, a single NVL8 system—eight Blackwell GPUs linked together—delivers 253 transactions per second per user (TPS / user) and a total system throughput of roughly 30 000 TPS when running the full 671‑billion‑parameter DeepSeek‑R1 model. That performance translates into a 25‑fold increase in revenue generation per token compared with the H100, while the cost per token falls by a factor of twenty, the same account reported. The gains are attributed to TensorRT DeepSeek optimizations that exploit the new FP4 precision path, which “delivers state‑of‑the‑art” efficiency on Blackwell silicon.
The performance jump is not merely a hardware story; it reflects a co‑design effort between NVIDIA and the DeepSeek team. Bloomberg notes that a U.S. lawmaker cited NVIDIA’s “optimized co‑design of algorithms, frameworks and hardware” as the driver behind the R1 model’s cutting‑edge speed. By aligning the model’s computational graph with Blackwell’s tensor cores and integrating the TensorRT inference engine, NVIDIA has reduced memory traffic and latency, enabling the dramatic throughput gains observed in the NVL8 benchmark. Tom’s Hardware corroborates the claim, reporting a 45 % increase in inference throughput over the preceding B200 generation, which itself was built on the H100.
Beyond raw speed, NVIDIA is packaging DeepSeek‑R1 as a ready‑to‑deploy microservice. The company announced that the 671‑billion‑parameter model is available in preview as an NVIDIA NIM (NVIDIA Inference Microservice) on its build portal, allowing developers to “securely experiment and build your own specialized agents,” according to a second NVIDIA AI Twitter post. This move lowers the barrier to entry for enterprises that want to leverage a model of this scale without investing in custom training pipelines, potentially expanding the addressable market for Blackwell‑based inference services.
From a market perspective, the 25× revenue uplift and 20× cost reduction reshape the economics of deploying massive LLMs at scale. If the per‑token cost falls to a fraction of what it was on H100‑based clusters, cloud providers and enterprise AI teams can run more queries within the same budget, accelerating adoption of high‑parameter models for tasks such as code generation, scientific reasoning, and personalized assistants. The reported throughput of 30 K TPS per system also suggests that a relatively modest fleet of Blackwell‑equipped nodes could service workloads that previously required dozens of H100 servers, translating into lower data‑center power and cooling expenses.
Analysts will now watch how quickly NVIDIA can convert these benchmark claims into commercial contracts. The company’s ability to monetize the Blackwell‑DeepSeek synergy will depend on the speed of NIM rollout, the robustness of the FP4 precision path in production workloads, and the willingness of cloud operators to replace existing H100 infrastructure. While the Twitter metrics—nearly 3,000 likes on the performance announcement—signal strong industry interest, the real test will be the volume of revenue generated from the “25 × more revenue” claim once enterprises begin to bill end‑users for token consumption on Blackwell‑powered services. Until those figures materialize, the record‑fast inference remains a compelling proof point that could tilt the competitive balance toward NVIDIA in the high‑end LLM market.
Sources
This article was created using AI technology and reviewed by the SectorHQ editorial team for accuracy and quality.