The AI industry is running out of compute, with outages, rationing, and rising GPU prices
Key Points
- Surging demand for agentic AI is creating a massive capacity crisis. According to the Wall Street Journal, Anthropic's API availability sits at just 98.95 percent, well below the industry standard of 99.99 percent, and the company is already losing enterprise customers to OpenAI as a result.
- OpenAI is shutting down its video generation app Sora to free up compute for coding and enterprise products. Token usage jumped from 6 billion to 15 billion per minute between October and March.
- GPU prices have climbed 48 percent, according to the Ornn Compute Price Index. Bank of America analysts expect demand to outstrip supply through at least 2029.
Surging demand for AI agents is colliding with limited compute capacity. Anthropic is struggling with outages, OpenAI announced the end of Sora, and GPU prices have jumped nearly 50 percent, according to market data.
The AI boom is consuming compute faster than the industry can supply it. According to a Wall Street Journal report, the explosive growth of agentic AI, autonomous tools that complete tasks on their own, has triggered a massive capacity crisis in recent months. The consequences include frequent outages at leading providers, canceled or scaled-back products, and sharply rising chip prices.
Anthropic is growing fast but can't keep the lights on
Anthropic, the maker of the Claude chatbot and the Claude Code coding app, has been hit especially hard. Since mid-February, outages have piled up so frequently that some enterprise customers are switching to other providers, the WSJ reports. David Hsu, founder of the software platform Retool, told the WSJ he actually prefers Anthropic's Opus 4.6 model but recently moved to OpenAI because Anthropic's service kept going down.
According to the report, the Claude API's uptime over the 90 days ending April 8 was 98.95 percent, far below the 99.99 percent standard that established cloud providers typically maintain.
At the same time, Anthropic is growing at a staggering pace. The company's annualized revenue rate (ARR) stood at $9 billion at the end of 2025, rose to $14 billion in February, and crossed $30 billion just two months later.
OpenAI is killing Sora to free up compute for coding and enterprise tools
OpenAI is feeling the squeeze, too. The company recently announced it will shut down its Sora video generation app, partly to redirect compute toward coding and enterprise products built on a new AI model codenamed Spud. The web and app versions of Sora will go offline on April 26, with the API following in September.
Token usage across OpenAI's API jumped from 6 billion per minute in October to 15 billion per minute by the end of March, according to the WSJ. OpenAI CFO Sarah Friar told the WSJ that she spends much of her time hunting for near-term compute capacity and that the company is making difficult decisions about which projects to shelve because resources simply aren't available.
Providers have been rolling out new limits since January to manage the agent boom
The capacity crisis is also reshaping plans for developer tools, which increasingly run agentic workloads that consume far more tokens.
GitHub announced new limits for Copilot on April 10, explicitly citing rapid growth, high concurrency, and intensive usage as reasons. Users who hit the new caps have to wait or switch to other models.
OpenAI also shifted its Codex billing for enterprises from flat message-based pricing to token-based metering in early April and introduced a new $100 Pro tier designed for longer, compute-heavy coding sessions. The cheaper Plus plan was adjusted to favor many shorter sessions spread across the week rather than single intensive bursts.
Windsurf replaced its credit system in March with daily and weekly quotas, offering add-on capacity at API prices. Anthropic adjusted its session limits in late March and temporarily offered double usage during off-peak hours to spread the load more evenly.
The broader trend is clear: regular chat and agentic work are increasingly priced separately, with heavy workloads managed through dedicated pools, credits, and token-based surcharges.
GPU prices are surging as infrastructure fails to keep pace
Spot market prices for Nvidia's GPUs have climbed sharply. According to the Ornn Compute Price Index, one hour on a latest-generation Blackwell chip now costs $4.08, a 48 percent increase from $2.75 just two months ago.
The WSJ reports that Coreweave, one of the largest publicly traded AI cloud providers, raised prices by more than 20 percent toward the end of 2025 and now requires smaller customers to sign three-year contracts instead of one. Bank of America analysts expect demand to outstrip supply through at least 2029.
Vultr CEO J.J. Kardwell told the WSJ that the current capacity crisis is unlike anything he has seen in more than five years of running his cloud infrastructure business. He pointed to long hardware lead times, slow data center buildouts, and the fact that available power through 2026 is already spoken for as the main bottlenecks.
Raising prices is one way to address the shortage. But for the leading AI companies locked in a fierce battle for users, that would be a risky move.
AI News Without the Hype – Curated by Humans
Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.
Subscribe now