OpenAI runs code on the fastest, largest AI chip ever built, setting new performance
Photo by Levart_Photographer (unsplash.com/@siva_photography) on Unsplash
While earlier AI code assistants lagged behind human speed, OpenAI’s new GPT‑5.3‑Codex‑Spark runs on the fastest, largest AI chip ever built, delivering over 1,000 tokens per second, Jackpearce reports.
Key Facts
- •Key company: OpenAI
- •Also mentioned: Cerebras
OpenAI’s GPT‑5.3‑Codex‑Spark is the first model engineered explicitly for “real‑time” coding, a claim backed by the company’s own rollout notes that the system can sustain more than 1,000 tokens per second while handling a 128 k‑token context window — a speed that eclipses earlier AI code assistants, which “lagged behind human speed,” according to Jackpearce (2026). The performance boost is not solely a software trick; it rests on Cerebras’s Wafer‑Scale Engine 3 (WSE‑3), the “world’s largest AI processor for training and inference,” which OpenAI has integrated via its partnership with the chipmaker. Cerebras lists the WSE‑3 at 46,255 mm² with 4 trillion transistors, delivering 125 petaflops of AI compute across 900,000 AI‑optimized cores, a hardware envelope that is “19× more transistors and 28× more compute than NVIDIA’s B200” — the same benchmark that has underpinned most large‑scale transformer deployments to date (Cerebras, as cited by Jackpearce).
Beyond raw silicon, OpenAI re‑architected the request‑response pipeline to shave latency at every stage. The company introduced persistent WebSockets and other stack‑level optimizations that, per the announcement, cut per‑client round‑trip overhead by 80 %, per‑token overhead by 30 %, and time‑to‑first‑token by half. Those reductions, when combined with the WSE‑3’s massive throughput, enable the “near‑instant” experience the firm promises for developers who need to iterate quickly on UI tweaks or small code changes. Early users on the free research preview have described the experience as “incredibly addictive,” noting that the speed makes it practical to ask rapid, context‑aware questions about a codebase without the usual waiting period (Jackpearce).
The move positions OpenAI against a backdrop of intensifying competition in AI‑assisted development tools. While rivals such as Anthropic and Google have released code‑focused models, none have publicly paired a model with a wafer‑scale processor of this magnitude. Industry observers on Reddit have expressed mixed feelings: some question whether ultra‑fast inference translates into real‑world productivity gains, while others applaud the capability for “making small changes to your codebase that don’t require a ton of validation.” The debate underscores a broader question about the value of speed versus accuracy in software engineering workflows, a tension that OpenAI appears to be betting on by emphasizing latency reductions as a core product differentiator.
Financially, the launch arrives as OpenAI continues to absorb massive capital inflows. Bloomberg’s Gautam Mukunda warned that the company’s recent $40 billion funding round could “lead startups to chase all the wrong priorities,” implying that the hefty resources may enable ambitious hardware partnerships like the one with Cerebras. Yet the same capital surge also fuels OpenAI’s ability to secure cutting‑edge silicon, a strategic advantage that could lock in enterprise customers seeking the fastest coding assistance available. If the performance claims hold up under broader adoption, the WSE‑3‑backed Codex‑Spark could become a de‑facto standard for high‑throughput AI coding, compelling competitors to either develop comparable wafer‑scale solutions or double down on software‑only optimizations.
Analysts will be watching early adoption metrics closely. The free research preview’s “low, medium, high, and extra‑high effort modes” provide a sandbox for developers to test the model’s limits, but OpenAI has not yet disclosed usage statistics or revenue projections tied to the new offering. As the AI industry grapples with scaling challenges—both in compute and in model latency—OpenAI’s gamble on the largest AI chip ever built may set a precedent: that the next frontier of productivity tools will be defined not just by model architecture, but by the hardware that powers them.
Sources
Reporting based on verified sources and public filings. Sector HQ editorial standards require multi-source attribution.