Mistral AI launches official NVFP4 model and adds Mistral‑Small‑4‑119B‑2603 and
Photo by Kevin Ku on Unsplash
While earlier releases hinted at a pending upgrade, reports indicate Mistral AI has now delivered the official NVFP4 model, debuting the Mistral‑Small‑4‑119B‑2603‑NVFP4.
Key Facts
- •Key company: Mistral AI
- •Also mentioned: Hugging Face
Mistral AI’s latest push into the open‑source frontier arrives with a trio of new model checkpoints on Hugging Face, each tagged with the NVFP4 moniker that signals a shift to the newer “NVIDIA FP4” quantization format. The flagship release, mistralai/Mistral‑Small‑4‑119B‑2603‑NVFP4, was announced in a brief post that simply declared “Mistral releases an official NVFP4 model, Mistral‑Small‑4‑119B‑2603‑NVFP4!” According to the same source, the model is built on the 119‑billion‑parameter “Mistral‑Small‑4‑2603” architecture that Mistral has been iterating on since earlier this year. By moving to the NVFP4 format, the company claims the checkpoint can run more efficiently on NVIDIA hardware, cutting memory footprints without sacrificing the multilingual fluency that the original model demonstrated across ten languages—including Arabic, Japanese, and Chinese.
Alongside the NVFP4 checkpoint, Mistral uploaded two companion models to the Hugging Face repository. The first, mistralai/Leanstral‑2603, is listed under the vLLM library with an Apache‑2.0 license and has already been downloaded four times, according to the repository metadata. The second, mistralai/Mistral‑Small‑4‑119B‑2603‑eagle, is a finetuned variant that references the base Mistral‑Small‑4‑119B‑2603 model and carries the same multilingual tags. Its download count sits at five, while it has attracted three “likes” from the community, suggesting a modest but engaged early audience.
The raw download numbers—100 for the base Mistral‑Small‑4‑119B‑2603 model, 4 for Leanstral‑2603, and 5 for the eagle variant—paint a picture of a nascent ecosystem still gathering momentum. All three repositories share the same Apache‑2.0 license, which Mistral AI has used consistently to encourage unrestricted use and downstream innovation. The vLLM tag on each entry hints at compatibility with the high‑throughput inference engine that many developers rely on for serving large language models at scale, a detail that could prove crucial as the community experiments with the new NVFP4 quantization.
What sets these releases apart is not just the quantization tweak but the breadth of language support baked into the checkpoints. The model tags list a dozen locales—English, French, Spanish, German, Italian, Portuguese, Dutch, Japanese, Korean, Chinese, Arabic, and Russian—mirroring Mistral’s earlier claims of “truly multilingual” capabilities. By publishing the NVFP4 version alongside the original FP8‑based checkpoint, Mistral gives developers a direct side‑by‑side comparison of performance versus precision, a move that aligns with the broader open‑source trend of democratizing model optimization. As the community begins to benchmark these models on real‑world workloads, the NVFP4 format could become a reference point for anyone looking to squeeze more efficiency out of massive 119‑billion‑parameter transformers without resorting to proprietary solutions.
Reporting based on verified sources and public filings. Sector HQ editorial standards require multi-source attribution.