OpenAI Warns AI Industry Faces Compute Shortage, Outages and Soaring GPU Costs
Photo by Steve Johnson on Unsplash
95%—that’s Anthropic’s current API uptime, lagging the 99.99% industry benchmark, according to The‑Decoder, which warns the AI sector is hitting a compute crunch marked by outages, rationing and soaring GPU costs.
Key Facts
- •Key company: OpenAI
- •Also mentioned: Anthropic
OpenAI’s decision to pull the plug on Sora—its once‑celebrated video‑generation demo—has become the most visible symptom of a deeper, industry‑wide crunch. The company announced that both the web and app versions of Sora will disappear on April 26, with the API slated for shutdown in September, in order to reroute scarce GPU cycles toward its new “Spud” model that powers coding assistants and enterprise‑grade tools (The‑Decoder). CFO Sarah Friar told the Wall Street Journal she now spends “much of her time hunting for near‑term compute capacity,” underscoring how even the market leader is forced to triage projects when the silicon supply chain can’t keep pace with demand.
The crunch is not limited to OpenAI. Anthropic, whose Claude chatbot and Claude Code app have been gaining traction, is already feeling the heat. According to the Wall Street Journal, the Claude API’s uptime over the 90‑day window ending April 8 was 98.95 percent—well short of the 99.99 percent reliability benchmark that cloud giants typically uphold. The dip has tangible business consequences: enterprise customers such as Retool’s founder David Hsu have begun migrating to OpenAI after Anthropic’s service “kept going down” (WSJ). Anthropic’s revenue trajectory is staggering—$9 billion ARR at the end of 2025, $14 billion in February, and a jump to over $30 billion just two months later—but the rapid growth is outstripping the compute it can reliably deliver (The‑Decoder).
Token consumption is exploding in tandem with the surge of “agentic” AI—autonomous tools that can execute tasks without human prompting. OpenAI’s own API traffic leapt from 6 billion tokens per minute in October to 15 billion by the end of March, a more than doubling that the WSJ attributes to the proliferation of developer‑facing agents (WSJ). GitHub’s Copilot, for instance, announced fresh usage caps on April 10, explicitly citing the “rapid growth, high concurrency” of agentic workloads (The‑Decoder). These limits are a blunt acknowledgment that the underlying hardware cannot absorb the current velocity of requests.
The hardware shortage is reflected in price signals. The Ornn Compute Price Index, tracking GPU market rates, shows a 48 percent jump in GPU costs over the past few months (The‑Decoder). Bank of America analysts, citing the same data, warn that demand will continue to outstrip supply through at least 2029, suggesting that the current squeeze may become a multi‑year structural bottleneck (The‑Decoder). For startups and established firms alike, the rising cost of compute translates directly into higher operating expenses and tighter margins, prompting many to reconsider product roadmaps or throttle back ambitious features.
All told, the compute crunch is reshaping the AI landscape as quickly as it birthed it. Providers are scrambling to impose new limits, developers are forced to prioritize token‑efficient prompts, and the most visible players are trimming the fat to keep the lights on. As The‑Decoder notes, the AI boom is “consuming compute faster than the industry can supply it,” and the fallout is already visible in outages, product cancellations, and a near‑term price surge that could redefine the economics of AI for years to come.
Sources
Reporting based on verified sources and public filings. Sector HQ editorial standards require multi-source attribution.