Skip to main content
Google

Google launches Gemini 3.1 Flash‑Lite, scaling AI intelligence for enterprise use

Written by
Maren Kessler
AI News
Google launches Gemini 3.1 Flash‑Lite, scaling AI intelligence for enterprise use

Photo by Solen Feyissa (unsplash.com/@solenfeyissa) on Unsplash

While developers have wrestled with Gemini 2.5 Flash’s higher cost and latency, Google now offers Gemini 3.1 Flash‑Lite—preview‑ready, 5× cheaper per token and faster, per the Blog.

Key Facts

  • Key company: Google

Google’s newest Gemini model arrives as a direct answer to the cost‑and‑latency complaints that have dogged Gemini 2.5 Flash among enterprise developers. In a blog post dated March 3, 2026, the Gemini team announced Gemini 3.1 Flash‑Lite, a preview‑ready model that costs just $0.25 per million input tokens and $1.50 per million output tokens—roughly five times cheaper than its predecessor (Google AI Blog). The company also claims a 2.5× faster “time‑to‑first‑answer” token and a 45 % boost in overall output speed, measured on the Artificial Analysis benchmark, while delivering comparable or better quality (Google AI Blog).

Performance metrics place 3.1 Flash‑Lite near the top of its tier on third‑party leaderboards. The model posted an Elo score of 1,432 on the Arena.ai leaderboard and outperformed peers on reasoning and multimodal benchmarks, hitting 86.9 % on GPQA Diamond and 76.8 % on MMMU Pro—scores that even surpass earlier Gemini generations such as 2.5 Flash (Google AI Blog). These figures suggest the model can handle both high‑volume, low‑complexity tasks (e.g., bulk translation, content moderation) and more nuanced workloads like UI generation, simulation building, and multi‑step SaaS agent orchestration (Google AI Blog).

Developers gain granular control over “thinking levels” through the Gemini API in Google AI Studio and Vertex AI, allowing them to dial in the amount of model reasoning required for a given request (Google AI Blog). Early‑access partners—including Latitude, Cartwheel, and Whering—have already deployed Flash‑Lite for use cases such as populating e‑commerce wireframes with hundreds of products, generating real‑time weather dashboards, and sorting large image collections (Google AI Blog). By offering a low‑latency, cost‑effective option, Google aims to capture the “high‑frequency workflows” market that has been a pain point for customers of larger, more expensive models.

The launch comes amid heightened scrutiny of Google’s defense contracts. Reuters reported that Alphabet will let its U.S. military drone‑imagery analysis contract lapse in March, a move spurred by internal employee protests (Reuters). While the Gemini rollout is unrelated to that decision, the timing underscores Google’s broader strategy to pivot toward commercial AI services that can be monetized at scale without the political baggage of government work.

Analysts see Gemini 3.1 Flash‑Lite as a tactical play to lock in enterprise customers before rivals such as Anthropic and Microsoft roll out their own low‑cost, high‑throughput models. The pricing structure—$0.25 per million input tokens and $1.50 per million output tokens—positions Flash‑Lite well below the rates quoted for comparable OpenAI and Anthropic offerings, according to the Google blog. If adoption mirrors early‑access reports, the model could become the default choice for developers building real‑time, high‑volume AI features, reinforcing Google’s foothold in the enterprise generative‑AI market.

Sources

Primary source

This article was created using AI technology and reviewed by the SectorHQ editorial team for accuracy and quality.

More from SectorHQ:📊Intelligence📝Blog
About the author
Maren Kessler
AI News

🏢Companies in This Story

Related Stories