Google launches offline AI dictation app as Gemma 4 runs fully on iPhone, boosting edge
Photo by Markus Spiske on Unsplash
128 K tokens—about 100,000 words—of on‑device context now run offline on iPhone, as Google’s Gemma 4 E2B variant, a 2.54 GB download, operates within 1.5 GB RAM without any cloud connection, reports indicate.
Key Facts
- •Key company: Google
Google’s new iOS app, Google AI Edge Eloquent, isn’t just a novelty—it’s the first consumer‑ready product that lets the Gemma 4 family run entirely on a phone without ever touching the cloud. The app bundles the 2.54 GB Gemma 4 E2B model, which, thanks to 2‑bit/4‑bit MoE quantisation, fits comfortably inside 1.5 GB of RAM and can hold a 128 K‑token (≈100 k‑word) context window on‑device, according to the “Gemma 4 runs fully offline on iPhone” report. In practice that means you can feed the model a full‑length article, a long email thread, or a multi‑page script and have it reason over the entire text without a single byte leaving your iPhone. No API key, no subscription, no data‑privacy concerns—just a pure‑offline inference engine built on Apache 2.0‑licensed code released in early April (see the same report).
The app’s primary consumer hook is its offline dictation workflow. As 9to5Google describes, users speak into the microphone, watch a live transcription and waveform, then hit “stop” to let the model clean up the raw speech. The cleanup stage strips filler words (“um”, “ah”), normalises punctuation, and offers a handful of post‑processing “tools” – Key points, Formal, Short, Long – that re‑format the text on the fly. Once the final version is ready, it’s automatically copied to the clipboard for past‑and‑paste into any other app. TechCrunch notes that the app is free to download and that the Gemma‑based ASR models are downloaded once, after which the entire dictation pipeline stays offline, a stark contrast to cloud‑dependent rivals like Wispr Flow or SuperWhisper.
Beyond speech, Google AI Edge Eloquent doubles as a sandbox for on‑device multimodal AI. Simon Willison’s “Google AI Edge Gallery” points out that the same app can answer questions about images, transcribe up to 30‑second audio clips, and run a “skills” demo that showcases tool‑calling against eight interactive widgets (interactive‑map, kitchen‑adventure, calculate‑hash, text‑spinner, mood‑tracker, mnemonic‑password, etc.). While the source code for those widgets isn’t publicly exposed, the demo proves that the Gemma 4 models can orchestrate multi‑step reasoning and invoke external functions entirely on the phone, a capability that previously required server‑side inference.
The technical achievement lies in how the E2B variant squeezes a 2.54 GB model into 1.5 GB of RAM without swapping, even on “most modern iPhones,” the offline report confirms. This is not a crippled, low‑capacity version; it retains the full MoE architecture and benefits from aggressive quantisation, delivering the same 128 K token context window that the cloud‑hosted Gemma 4 models provide. In practice, users can load a lengthy document, ask the model to summarise, extract entities, or even chain actions across the eight built‑in skills, all while the phone’s CPU and memory stay within comfortable limits. The result is a truly edge‑first AI experience that sidesteps latency, bandwidth, and privacy hurdles that have hamstrung mobile AI until now.
Google’s move signals a broader shift toward on‑device AI ecosystems. By packaging a high‑performing LLM, a speech‑to‑text pipeline, and multimodal tool‑calling into a single, free iOS app, Google is testing the waters for a future where “AI as a service” is optional rather than mandatory. The company has not announced any monetisation plan for the app, and the 9to5Google piece notes the absence of a subscription model—a stark contrast to the paywalls that dominate many competing dictation tools. If the app gains traction, it could become a reference point for developers looking to embed LLM capabilities directly into their own mobile products without relying on cloud APIs.
In short, Google AI Edge Eloquent is more than a neat offline dictation toy; it’s a proof‑of‑concept that the latest generation of LLMs can live comfortably on a consumer smartphone, offering 100 k‑word context, multimodal reasoning, and privacy‑first processing—all without a data plan. As the Verge’s own coverage of edge AI has shown, the real battle now is not whether models can run on‑device, but how developers will leverage that capability to build richer, faster, and more secure experiences for everyday users.
Sources
- Reddit - r/LocalLLaMA New
Reporting based on verified sources and public filings. Sector HQ editorial standards require multi-source attribution.