Skip to main content
Mistral

Mistral Small 4 Unifies Reasoning, Vision, and Coding, Beats GPT‑4.1 on Docs, Cuts Costs

Published by
SectorHQ Editorial
Mistral Small 4 Unifies Reasoning, Vision, and Coding, Beats GPT‑4.1 on Docs, Cuts Costs

Photo by Alex Woods (unsplash.com/@woodzie) on Unsplash

Mistral released Small 4, an open‑source model that unifies reasoning, vision and coding, and outperforms GPT‑4.1 on document benchmarks while cutting inference costs, VentureBeat reports.

Key Facts

  • Key company: Mistral

Mistral’s Small 4 arrives as a single‑purpose model that collapses three traditionally separate AI stacks—reasoning, multimodal vision, and agentic coding—into one open‑source offering. The 119‑billion‑parameter mixture‑of‑experts (MoE) architecture activates only six billion parameters per token, a design Mistral says “delivers best‑in‑class efficiency” while preserving the capabilities of its earlier specialist models — Magistral for reasoning, Pixtral for vision, and Devstral for coding — according to the company’s blog post cited by VentureBeat. The model also supports a 256 K context window, which the firm claims is suited for long‑form conversations and analysis, and it is released under an Apache 2.0 license, positioning it for unrestricted commercial use.

Performance on document‑understanding benchmarks paints a nuanced picture. On the IDP Core Bench, Small 4 scores 68.5 versus Qwen 3.5‑9B’s 76.2, lagging behind on key OCR and KIE sub‑tasks (65.5 vs 57.4 for OCR, 86.5 vs 78.3 for KIE). However, on the OmniDocBench the gap narrows dramatically: Small 4 posts a 76.7 overall score against Qwen’s 76.4, and it actually outperforms on table‑structure metrics, achieving a TEDS of 75.1 versus Qwen’s 73.9 and a TEDS‑S of 82.7 versus 77.6. The OlmOCR benchmark tells a different story, with Qwen leading on every sub‑category (78.1 vs 69.6) and a pronounced math‑OCR disparity (85.5 vs 66). Overall, Small 4 lands at rank #11 with a composite score of 71.5, while Qwen sits at #9 with 77.0, according to the IDP leaderboard compiled by VentureBeat.

Cost efficiency is where Small 4 aims to differentiate itself. Mistral advertises “shorter outputs that translate to lower latency and cheaper tokens,” a claim bolstered by the model’s MoE design that limits active parameters per token. The full‑precision checkpoint weighs 242 GB, making on‑premise deployment impractical without quantization; the company therefore supplies a 4‑bit NVFP4 quantized version, which reduces the footprint dramatically and is the only realistic path for users lacking multiple H100 GPUs. While the vision capabilities of the quantized model remain untested, the full‑precision API runs at a fraction of the inference cost of larger proprietary models, a point highlighted in the VentureBeat report.

Industry observers note that Small 4’s architectural flexibility could simplify AI pipelines for enterprises that currently juggle separate models for different tasks. Rob May, co‑founder and CEO of the small‑language‑model marketplace Neurometric, told VentureBeat that the model “stands out for its architectural flexibility,” but he also warned that the proliferation of niche, smaller models risks further fragmenting the market. The trade‑off is evident in the benchmark results: despite having a far larger total parameter count than Qwen 3.5‑9B, Small 4’s performance ceiling is modest, suggesting that raw size does not guarantee superiority on document‑centric workloads.

Mistral’s strategy of open‑weight, cost‑focused AI places it in direct competition with other compact models such as Qwen and Claude Haiku, while also challenging the dominance of proprietary giants like OpenAI and Anthropic. By bundling reasoning, vision, and coding into a single, efficiently‑run model, Mistral hopes to attract developers seeking a “build‑your‑own” AI stack without the licensing constraints of closed‑source offerings. Whether the modest gains on specific benchmarks translate into broader enterprise adoption will depend on how quickly the quantized version can retain vision fidelity and on the market’s appetite for a unified, open‑source alternative to the fragmented AI toolchains that dominate today.

Sources

Other signals
  • Reddit - r/LocalLLaMA New

Reporting based on verified sources and public filings. Sector HQ editorial standards require multi-source attribution.

More from SectorHQ:📊Intelligence📝Blog

🏢Companies in This Story

Related Stories