Google’s Gemini 3.1 Flash‑Lite Gains Smarts While Tripling Its Price, Claiming Fastest,
Photo by BoliviaInteligente (unsplash.com/@boliviainteligente) on Unsplash
While Gemini 3.1 Flash‑Lite was billed as Google’s cheapest, fastest model, The‑Decoder reports it now scores 34 on the Intelligence Index—12 points higher than Gemini 2.5—yet its price has tripled.
Key Facts
- •Key company: Google
Google’s Gemini 3.1 Flash‑Lite now posts a 34‑point score on Artificial Analysis’s Intelligence Index, a 12‑point jump over Gemini 2.5 Flash‑Lite, while still cranking out more than 360 tokens per second and delivering its first token 2.5 × faster than the older Gemini 2.5 Flash model, according to the preview released by DeepMind and reported by The‑Decoder. The speed boost translates to an average response time of 5.1 seconds, keeping the model in the “fastest” tier of Google’s Gemini 3 series despite the leap in capability.
The performance gains are most evident on multimodal benchmarks. Gemini 3.1 Flash‑Lite hit 78 % on the MMMU‑Pro suite, outpacing Claude Opus 4.6 and Kimi K2.5, while scoring 86.9 % on GPQA Diamond for scientific knowledge and 76.8 % on MMMU‑Pro for multimodal reasoning, per Artificial Analysis’s leaderboard data. On the Arena.ai human‑preference ranking, the model earned an Elo of 1,432, the highest among models of comparable size, confirming its edge in reasoning and multimodal understanding.
The price tag, however, has risen sharply. Output pricing now stands at $1.50 per million tokens—up from $0.40 on Gemini 2.5 Flash‑Lite—while input costs have increased from $0.10 to $0.25 per million tokens, according to the pricing table in The‑Decoder’s coverage. That represents more than a three‑fold jump, effectively ending the “cheapest” claim that defined the original Flash‑Lite positioning. Google’s own documentation notes that developers can still dial the model’s “thinking” depth, allowing the same model to handle high‑volume translation jobs as well as heavier tasks like UI generation, but the higher per‑token rates will bite into margins for large‑scale users.
Tool usage, a metric where many competitors have made strides, shows only marginal improvement. Artificial Analysis observed that Gemini 3.1 Flash‑Lite’s tool‑use performance remains roughly on par with its predecessor, suggesting that the model’s core strength lies in raw reasoning rather than external API integration. The context window stays at a massive one‑million tokens, preserving the ability to handle long documents without truncation—a feature that continues to differentiate Google’s Gemini line from rivals such as GPT‑5 mini and Claude 4.5.
In the broader AI market, the trade‑off between speed, capability, and cost is becoming a decisive factor for enterprises. While Gemini 3.1 Flash‑Lite’s token‑per‑second rate (363 t/s) still dwarfs GPT‑5 mini’s 71 t/s and Claude 4.5’s 108 t/s, the steep price increase may push cost‑sensitive developers toward alternatives that offer a more balanced price‑performance curve. As competition tightens, Google’s next move—whether to slash rates, introduce a new “lite” tier, or double down on premium performance—will shape how the Gemini brand retains its foothold in the fast‑moving generative‑AI landscape.
Sources
This article was created using AI technology and reviewed by the SectorHQ editorial team for accuracy and quality.