Apple Launches Fully Local iOS App Using Its On‑Device 3‑Billion‑Parameter Model, No
Photo by Amanz (unsplash.com/@amanz) on Unsplash
3 billion parameters power Apple’s new on‑device iOS app, which runs entirely on the Neural Engine at about 30 tokens per second, eliminating any cloud API or external service, according to a recent report.
Key Facts
- •Key company: Apple
Apple’s on‑device 3‑billion‑parameter model, introduced as part of Apple Intelligence, is now the core of a fully local iOS app called StealthOS, according to the developer who built it and posted a detailed report on GitHub. The app runs exclusively on the Neural Engine of supported iPhone and iPad hardware, delivering roughly 30 tokens per second without invoking any external API or cloud service. By keeping the entire inference pipeline on‑device, StealthOS promises the privacy‑first experience Apple has been marketing, while also demonstrating that a 3 B model can be practical for everyday mobile tasks such as phishing detection, document summarisation, and file‑based question answering.
The developer notes that the model’s modest size is offset by tightly scoped system prompts that guide the LLM into eight specialised modes—researcher, coder, analyst, and others. This prompt engineering, combined with the low‑latency Neural Engine, makes conversational interactions feel “natural,” especially in voice mode where the 30‑token‑per‑second throughput keeps latency low enough for real‑time dialogue. The report highlights that the speed “surprised” the author, suggesting that Apple’s custom silicon can deliver usable generative performance on a phone, a claim echoed by ZDNet’s coverage of the same app, which described the two‑day development effort as “electrifying” and praised the voice‑driven coding workflow enabled by the on‑device model.
However, the implementation is not without constraints. The app only runs on iOS 26+ devices equipped with Apple Intelligence—currently the A17 Pro chip and the M‑series Apple Silicon—because the model weights are baked into the operating system and cannot be swapped out by developers. The context window is also smaller than what desktop‑oriented local LLMs such as Phi‑4 can offer, limiting the length of prompts and histories that can be processed in a single pass. Moreover, integrating 26 auxiliary tools (web search, file operations, vision, etc.) required creative structured prompting, since the on‑device environment does not support the function‑calling mechanisms that cloud‑based APIs like OpenAI’s or Anthropic’s provide.
VentureBeat’s recent analysis of Apple’s on‑device AI ecosystem places StealthOS in a broader strategic context: Apple is positioning its offline models as a differentiator against rivals that rely heavily on cloud compute. By enabling developers to ship fully local AI experiences, Apple can lock users into its hardware ecosystem while sidestepping regulatory scrutiny over data transmission. The StealthOS case study validates that the trade‑off—accepting a smaller model and a tighter prompt regime—can still yield “useful” applications, especially for privacy‑sensitive workflows where data never leaves the device.
The emergence of a production‑grade, on‑device LLM app marks a milestone for Apple’s AI ambitions. While the 3 B model does not compete with the 70 B‑plus models dominating enterprise and research labs, its integration into a consumer‑facing app demonstrates that Apple’s Neural Engine can deliver generative capabilities at a scale that aligns with its privacy narrative. As more developers experiment with the built‑in model—subject to the hardware and OS limitations outlined in the developer’s report—the market may see a new class of AI‑enhanced iOS tools that operate entirely offline, reshaping the balance between convenience, performance, and data sovereignty.
Sources
No primary source found (coverage-based)
- Reddit - r/LocalLLaMA New
This article was created using AI technology and reviewed by the SectorHQ editorial team for accuracy and quality.