Nvidia B100 Mirrors H100 with HBM3E, Boosts Performance Metrics Across B200/B300 Lineup

While Nvidia’s Blackwell roadmap was shrouded in mystery, a recent report shows the B100 essentially mirrors the H100 with HBM3E, and the B200/B300 deliver markedly higher tensor‑core throughput.

Key Facts

•Key company: Nvidia

The B200’s specifications, uncovered in a Reddit thread where a user with early access posted detailed telemetry, reveal a core count of 18,944 and a boost clock of 1,965 MHz, delivering roughly 1,191 TFLOPS of FP16 tensor‑core throughput [Reddit]. Those figures line up with the performance tables in Nvidia’s Blackwell technical brief, which lists the same clock rates and core architecture for the Blackwell family [Nvidia technical brief]. By extrapolating from the documented FP16 density, analysts can infer the B200’s FP64, FP8, and INT8 capabilities, establishing a clear performance hierarchy that places the B200 ahead of the Hopper‑based H100 in raw tensor throughput while remaining on a comparable power envelope.

The B100, according to the same technical brief and the Blackwell ultra datasheet, is essentially a re‑skinned H100 that swaps the HBM2e memory subsystem for HBM3E [Nvidia datasheet]. This memory upgrade raises bandwidth to roughly 3 TB/s, matching the H100’s peak compute but offering a modest latency advantage for large‑scale model training. Nvidia has not released an official SKU for the B100, but the documentation’s block diagram and memory interface specifications confirm the H100‑core layout with the newer HBM3E stack, confirming the “H100 with HBM3E” characterization made by the community researcher.

The B300 builds on the B200’s core count and clock but reallocates a portion of its compute budget toward FP4 tensor operations. Nvidia’s datasheet notes a 50 % uplift in FP4 density, pushing the B300 to 14.29 PFLOPS of FP4 versus the B200’s 9.53 PFLOPS [Nvidia datasheet]. To accommodate this shift, the B300 trims FP64 and INT8 performance to levels comparable with the upcoming GeForce 5090, effectively trading double‑precision and integer inference speed for higher‑precision floating‑point throughput that benefits generative‑AI workloads. The chip therefore occupies a niche between the B200’s balanced tensor suite and the B202 variant slated for the 5090, offering a specialized tool for models that rely heavily on FP4 precision.

From a market perspective, the Blackwell lineup’s incremental improvements echo Nvidia’s broader strategy of layering modest architectural tweaks atop proven designs to sustain its dominance in the data‑center GPU market. By reusing the H100 core architecture for the B100 and simply upgrading the memory stack, Nvidia can accelerate time‑to‑market while leveraging existing software stacks. The B200’s higher clock and expanded core count translate directly into faster training cycles for large language models, a key selling point for enterprise customers racing to deploy next‑generation AI. Meanwhile, the B300’s FP4 focus aligns with the industry’s shift toward mixed‑precision training, where reduced‑precision formats can cut compute costs without sacrificing model quality.

Overall, the disclosed metrics suggest that Nvidia’s Blackwell family will deliver a measurable performance lift across the board, with the B200 and B300 offering roughly 20–30 % higher tensor‑core throughput than the H100, and the B100 providing a memory‑centric upgrade path for existing H100 deployments. The data, drawn from Nvidia’s own technical brief, the Blackwell ultra datasheet, and the community‑sourced Reddit post, paints a picture of a product line that is less a radical redesign and more a strategic refinement—one that reinforces Nvidia’s pricing power as AI compute demand continues to surge.

Nvidia B100 Mirrors H100 with HBM3E, Boosts Performance Metrics Across B200/B300 Lineup

Key Facts

Sources

🏢Companies in This Story

Related Stories