Skip to main content
OpenAI

OpenAI launches GPT‑5.4 with native computer use, tool search, and 272K surcharge trap

Published by
SectorHQ Editorial
OpenAI launches GPT‑5.4 with native computer use, tool search, and 272K surcharge trap

Photo by Jonathan Kemper (unsplash.com/@jupp) on Unsplash

272 000 dollars. That’s the unexpected surcharge trap embedded in OpenAI’s newly released GPT‑5.4, which debuted on March 5 and adds native computer use and tool‑search capabilities, reports indicate.

Key Facts

  • Key company: OpenAI

OpenAI’s GPT‑5.4 marks a decisive shift from conversational AI to an “operating model” that can directly manipulate a computer’s graphical interface. In internal testing, the model achieved a 75 % success rate on the OSWorld‑Verified benchmark—a metric that measures real‑time GUI navigation, clicking, typing, and app switching based on visual feedback—surpassing the 72.4 % human baseline and the 47.3 % score of its predecessor, GPT‑5.2 (EvoLink COO Jessie, March 16, 2026). The improvement hinges on GPT‑5.4’s ability to maintain “state consistency,” remembering UI elements across multiple applications, a capability that junior human operators typically lack.

A second breakthrough is the introduction of “Tool Search,” which eliminates the token bloat that has plagued agents built with extensive toolkits. Previously, developers had to embed full schema definitions for every tool in the system prompt, inflating token counts by 30‑50 %. GPT‑5.4 instead queries a tool’s definition on demand, cutting token usage by 47 % on Scale’s MCP Atlas benchmark without sacrificing accuracy (EvoLink). The feature is exposed via a new tool_search parameter in the OpenAI API’s tools array, allowing developers to keep prompts lean while still accessing a library of executables, browsers, and file‑system commands.

The model’s expanded context window—up to 1 million tokens—carries a hidden pricing cliff that has already caught several early adopters off guard. OpenAI charges standard rates of $2.50 per million input tokens and $15 per million output tokens up to 272 K tokens, but any session that exceeds that threshold is billed at double the input rate and 1.5× the output rate (EvoLink). This “272 K surcharge” can inflate a modest‑scale deployment into a $272 K bill if context management is ignored. EvoLink mitigates the risk with an auto‑truncation layer and recommends context caching at $0.25 per million tokens for static repositories, while keeping the active working memory dehydrated to stay under the cliff.

Developers can access GPT‑5.4 through the updated OpenClaw client, which now supports the gpt‑5.4 and gpt‑5.4‑pro endpoints after merging PR #36590 and fixing coordinate drift on high‑DPI displays (Issue #36817). A sample configuration file shows the model enabled with both computer_use and tool_search capabilities, and an allow‑list that restricts execution to “exec,” “browser,” “read,” and “write” operations (EvoLink). The Pro tier, priced at $30 per million input tokens versus $2.50 for the standard tier, delivers a notable performance jump on out‑of‑distribution tasks—scoring 83.3 % versus 73.3 % on the ARC‑AGI‑2 benchmark—and on high‑complexity mathematics, where it reaches 38.0 % versus 27.1 % for the standard model (EvoLink).

Industry observers note that the “operating model” label reflects a broader trend toward AI systems that act rather than merely chat. Ars Technica has warned that OpenAI is deliberately opaque about the internal reasoning of GPT‑5.4, suggesting the company is wary of scrutiny over how the model decides which tool to invoke and when to manipulate the UI (Ars Technica). Nonetheless, the technical gains are clear: GPT‑5.4 is the first model to statistically outperform humans at GUI navigation, and its dynamic tool lookup slashes token overhead dramatically. Enterprises that can engineer around the 272 K surcharge and choose the appropriate tier stand to reap the productivity benefits of truly autonomous AI agents.

Sources

Primary source

No primary source found (coverage-based)

Other signals
  • Dev.to AI Tag

Reporting based on verified sources and public filings. Sector HQ editorial standards require multi-source attribution.

More from SectorHQ:📊Intelligence📝Blog

🏢Companies in This Story

Related Stories