OpenAI launches GPT‑5.4, delivering new tools and specs developers need now
Photo by Jonathan Kemper (unsplash.com/@jupp) on Unsplash
1 million tokens. That’s the new context window OpenAI’s GPT‑5.4 ships with, alongside native computer‑use capabilities and a 47% reduction in token usage for tool search, according to a recent report.
Key Facts
- •Key company: OpenAI
OpenAI’s GPT‑5.4 arrives as a single, unified model that folds the most advanced capabilities of the GPT‑5 series into one API‑first offering, according to a detailed technical report by Tyson Cung posted on March 10, 2026. The headline feature is “native computer use,” a built‑in ability for the model to read screenshots, click buttons, type text and navigate desktop applications without relying on external plugins. In benchmark tests the model achieved a 75.0 % success rate on the OSWorld desktop‑navigation suite, surpassing the human baseline of 72.4 % and outpacing its predecessor GPT‑5.2 by 27.7 percentage points (Cung). For developers, the computer‑use tool can be steered through developer messages and safety policies, opening the door to fully automated UI testing, robust robotic‑process‑automation (RPA) flows that survive UI changes, and AI agents capable of multi‑step form‑filling or script execution without hand‑crafted click sequences.
The second major upgrade is a 1 million‑token context window for the API and Codex endpoints, a ten‑fold increase over the typical 8‑kilobyte limit of earlier models. Cung notes that while the standard context remains at 272 K tokens, requests that exceed this threshold are billed at double the normal rate and exhibit a modest drop in recall (79.3 % at 128‑256 K tokens on the MRCR 8‑needle test). The extended window is not intended for indiscriminate “dump‑the‑whole‑repo” calls; rather, it is positioned for tasks that truly demand long‑horizon planning, such as processing an entire codebase or chaining dozens of prior agent actions. OpenAI has kept the ChatGPT consumer interface at the conventional window size, reserving the massive context for enterprise developers who need deep, token‑rich interactions.
A third, developer‑centric enhancement is “tool search,” which slashes the token overhead of multi‑tool workflows by up to 47 %. Cung’s testing on Scale’s MCP Atlas benchmark—250 tasks across 36 MCP servers—shows that GPT‑5.4 can retrieve full tool definitions on demand instead of pre‑loading them, preserving accuracy while dramatically reducing prompt length. This efficiency gain is especially valuable for applications that integrate dozens of APIs or internal services, where the prompt can otherwise balloon to tens of thousands of tokens before any query is issued. The reduction translates directly into lower API costs and faster response times, making GPT‑5.4 a compelling upgrade for any platform that relies on extensive tool‑calling capabilities.
Performance on professional knowledge tests also jumps markedly. On the GDPval suite, which evaluates competence across 44 occupations, GPT‑5.4 scores 83 %—meaning it outperforms 83 % of human professionals in those domains—versus roughly 72 % for GPT‑5.2 (Cung). The report highlights strong gains in law, medicine, accounting and engineering, and notes a “GPT‑5.4 Pro” variant aimed at enterprise customers that pushes the metric even higher. ZDNet corroborates the headline claim, reporting that GPT‑5.4 “clobbers humans on pro‑level work in tests by 83 %” (ZDNet). These results suggest the model is moving from a general‑purpose chatbot toward a specialist assistant capable of handling high‑stakes, domain‑specific tasks with near‑expert proficiency.
Pricing reflects the added capabilities: OpenAI charges $2.50 per million input tokens and $10 per million output tokens for GPT‑5.4 (Cung). While the rates are higher than those for earlier GPT‑5 models, the cost is offset by the token savings from tool search and the productivity gains from native computer use and the expanded context window. For enterprises that need to automate complex workflows, run large‑scale code analyses, or deploy AI agents that can interact with real‑world software, the price‑performance equation appears favorable. As OpenAI rolls out the model, developers will have to weigh the trade‑offs between token consumption, latency and the new functional breadth that GPT‑5.4 brings to the AI development stack.
Sources
No primary source found (coverage-based)
- Dev.to Machine Learning Tag
This article was created using AI technology and reviewed by the SectorHQ editorial team for accuracy and quality.