Meta Unveils Four New MTIA Inference Chips, Claiming They Outperform Nvidia’s GPUs
Photo by Alexandre Debiève on Unsplash
While most analysts still tout Nvidia’s GPUs as the AI benchmark, Meta now claims its four new MTIA inference chips—models 300, 400, 450 and 500—outperform them, according to Theregister.
Key Facts
- •Key company: Meta
Meta’s four new MTIA chips—models 300, 400, 450 and 500—represent a rapid, chiplet‑based iteration cycle that the company says will deliver inference performance that eclipses contemporary Nvidia GPUs. According to Meta’s own blog, the accelerators are built “in close partnership with Broadcom” and employ a modular architecture that lets the firm swap out compute, networking or memory chiplets without redesigning the entire die (Meta AI blog). This approach, described by Tom’s Hardware, enables a six‑month cadence for new generations, a tempo that far outpaces the typical two‑year refresh cycle of most fabless silicon vendors (Tom’s Hardware).
The MTIA 300 is positioned as a communications‑focused accelerator for ranking and recommendation (R&R) workloads. It combines a single compute chiplet with two network chiplets and multiple HBM stacks, each compute chiplet housing a grid of processing elements (PEs) that include redundant units to improve manufacturing yield (Theregister). The redundancy strategy mirrors techniques used in high‑volume data‑center silicon, where yield‑enhancing redundancy can reduce cost per wafer. The 300’s HBM bandwidth is quoted at 6.1 TB/s, which the company claims already meets the demands of its internal recommendation engines.
The subsequent MTIA 400 moves the focus to general AI inference, with Meta stating that the chip is “heading to data centers now” (Meta AI blog). While specific performance numbers for the 400 have not been disclosed, the roadmap indicates a scaling of HBM bandwidth to 13.2 TB/s—roughly double that of the 300—and a corresponding increase in compute density. The 450 and 500 models push the bandwidth ceiling further, to 20.4 TB/s and 27.6 TB/s respectively, a 4.5× jump from the 300 generation (Meta AI blog). This bandwidth scaling is critical because, as the Meta AI blog notes, memory bandwidth has become the primary bottleneck for large‑language‑model (LLM) inference.
Beyond raw bandwidth, Meta emphasizes a low‑precision data path tailored for inference. The MTIA 500, the flagship of the series, is reported to achieve 30 peta‑FLOPS (PFLOPS) using custom data types that preserve model accuracy while maximizing throughput (Meta AI blog). These data types are integrated into the chip’s vector units, each built around a pair of RISC‑V vector extensions, allowing the accelerator to process more operations per clock cycle than traditional FP16 or BF16 pathways common in Nvidia’s A100 and H100 GPUs. The company also highlights software parity: the MTIA family supports native PyTorch execution via torch.compile, Triton kernels and a vLLM plugin, enabling models to run on Meta silicon without code rewrites (Meta AI blog).
Broadcom’s involvement extends beyond design assistance. In a statement, Broadcom confirmed that Meta plans to install “multiple gigawatts” of MTIA silicon by 2027, a scale that would rival the total power draw of the largest GPU farms currently in operation (Theregister). This gigawatt‑level deployment suggests Meta intends to replace a substantial portion of its existing GPU inventory with in‑house accelerators, a move that could reshape the economics of its AI infrastructure. Bloomberg reports that the chips are slated for rollout over the next two years, with the 400 already shipping to Meta data centers and the 450 and 500 expected in 2027 (Bloomberg).
Analysts have noted the strategic shift implied by an “inference‑first” design philosophy. Unlike Nvidia, which builds GPUs primarily for training and then repurposes them for inference, Meta’s MTIA line is purpose‑built for serving models at scale (report). By optimizing memory bandwidth, low‑precision compute and chiplet modularity, Meta aims to achieve higher throughput per watt—a metric increasingly important as AI workloads dominate data‑center power budgets. If the claimed 30 PFLOPS figure for the MTIA 500 holds up in real‑world deployments, it would place the chip ahead of Nvidia’s H100 in pure inference throughput, though head‑to‑head benchmarks have yet to be published. The industry will be watching closely as Meta begins to ship the 400 and later the 450/500, to see whether the performance claims translate into measurable gains in latency, cost and energy efficiency for large‑scale AI services.
Sources
- Reddit - r/LocalLLaMA New
This article was created using AI technology and reviewed by the SectorHQ editorial team for accuracy and quality.