Skip to main content
Mistral AI

Mistral AI Launches Small 4 Model, Leveraging 128 Expert Modules to Outperform Peers

Published by
SectorHQ Editorial
Mistral AI Launches Small 4 Model, Leveraging 128 Expert Modules to Outperform Peers

Photo by Compare Fibre on Unsplash

128 expert modules power Mistral Small 4, a 119‑billion‑parameter model that activates only 6 billion per query, according to The‑Decoder.

Key Facts

  • Key company: Mistral AI

Mistral Small 4’s architecture hinges on a mixture‑of‑experts (MoE) design that splits the 119‑billion‑parameter backbone into 128 specialist modules, each tuned for distinct linguistic or visual tasks. At inference time the model activates only four experts, limiting the compute load to roughly 6 billion active parameters per query—a figure the company says is 40 percent faster than its prior Small 3 release and enables three times the queries‑per‑second throughput on identical hardware (The‑Decoder). This dynamic gating lets developers trade latency for depth: a “quick‑response” mode routes the request through the most efficient experts, while a “think‑harder” mode engages additional experts to boost chain‑of‑thought reasoning without inflating the overall parameter count.

In internal benchmark suites, Small 4’s high‑reasoning configuration matches or exceeds the performance of Mistral’s larger, task‑specific Magistral models, which were previously positioned as the gold standard for logical deduction and multimodal understanding (The‑Decoder). The model’s multimodal capability—supporting both text generation and image processing—stems from shared expert modules that can be called on by either modality, reducing the need for separate vision‑only or language‑only pipelines. This consolidation is reflected in the model’s open‑source release under the Apache 2.0 license, making the code and weights freely available on Hugging Face, the Mistral API, and Nvidia’s model hub (The‑Decoder).

Mistral AI’s decision to ship Small 4 as an open‑source artifact aligns with its broader strategy of fostering an ecosystem around open AI models. The company announced its participation in the Nvidia Nemotron Coalition, a consortium aimed at accelerating the development of open‑source large language models on Nvidia hardware (The‑Decoder). By placing Small 4 on Nvidia’s platform, Mistral ensures that the model can leverage tensor‑core optimizations and the company’s latest inference libraries, which are critical for realizing the claimed 40 percent speed gains in real‑world deployments.

The release follows a rapid cadence of updates to Mistral’s open‑source portfolio. VentureBeat reported that the firm upgraded its Small series from version 3.1 to 3.2 just days before unveiling Small 4, a move that introduced incremental improvements to token efficiency and routing algorithms (VentureBeat). The same outlet highlighted that Small 4 “outperforms GPT‑4o Mini with a fraction of the parameters,” a claim that Mistral backs with its internal benchmark data but has not yet been independently verified (VentureBeat). Nonetheless, the combination of MoE efficiency, multimodal support, and permissive licensing positions Small 4 as a compelling alternative for enterprises seeking high‑throughput, low‑latency AI services without the cost of proprietary cloud offerings.

Finally, Mistral’s broader product roadmap suggests that Small 4 will serve as the backbone for its newly announced Mistral Compute cloud service, a Europe‑focused AI‑optimized platform designed to compete with the likes of AWS and Azure (VentureBeat). By offering a model that can scale from edge‑device inference to large‑scale batch processing while maintaining a consistent licensing model, Mistral aims to lock in developers who prefer open‑source flexibility over vendor lock‑in. If the performance claims hold up in external testing, Small 4 could set a new benchmark for how mixture‑of‑experts models are deployed in production environments.

Sources

Primary source

Reporting based on verified sources and public filings. Sector HQ editorial standards require multi-source attribution.

More from SectorHQ:📊Intelligence📝Blog

🏢Companies in This Story

Related Stories