The Compute Crunch
Why infrastructure is now the basis of competition
AI is hitting a wall on compute. And unlike past tech bottlenecks, this one is showing up everywhere at once - pricing, product decisions, reliability, even roadmap triage. A few things that feel underappreciated:
1) Tokens are the real scarce resource - not GPUs
We talk about chips, but the unit that actually matters is tokens per second. OpenAI’s API usage went from ~6B tokens/minute to 15B in ~5 months.
2) “Agentic AI” broke the capacity model
Chatbots were bursty. Agents are persistent. They don’t just answer - they loop, plan, retry, call tools. Same user → 10–100x more compute.
3) Reliability is collapsing under success
Anthropic at ~99% uptime sounds fine until you realize the modern internet runs on four nines (99.99%). When “intelligence” becomes infrastructure, 1% downtime feels like a product failure.
4) Pricing power is back (and uncomfortable)
Blackwell GPUs: +48% in ~2 months
CoreWeave: +20% price hikes + multi-year commits
Spot markets tightening across the board
5) The response is ruthless prioritization
In a normal market, prices would keep rising until demand cools. But frontier labs are in a land grab for users so they can’t just pass this through. So instead they do what constrained platforms always do: they ration in less visible ways. Rate limits.
Usage caps.
Tiering.
Latency tradeoffs.
Feature gating.
Selective generosity for strategic customers.
Quiet deprioritization for everyone else.
This is the awkward truth of the current AI economy: many products are being sold into a market whose demand curve is stronger than the underlying infrastructure curve can comfortably support. That mismatch gets hidden in policy, packaging, and degraded user experience.
6) Compute constraints are shaping the roadmap
Features are increasingly being delayed, limited, or killed because they do not fit the compute budget. OpenAI shelving Sora (in part) to free up compute is a canary. That is a very different world from one where products live or die on user demand, technical feasibility, or strategic taste alone.
Now you can build something impressive, prove people want it, and still lose the internal argument because the inference bill is too ugly.
Compute is no longer just an input. It’s the constraint shaping this market.
Which brings us to OpenAI and its suspiciously well-timed “leaks.”
Yesterday, a memo from the OpenAI CRO Denise Dresser leaked online.The message was simple: compute is our edge, and Anthropic underbought. The memo
calls out Anthropic’s “strategic misstep” in not securing enough compute
ties that directly to throttling, weaker availability, and reliability issues
positions their own capacity as enabling higher token limits, lower latency, more reliable execution
In a world where customers are hitting limits in 45 minutes or routing around outages, this gap can compound fast. OpenAI is trying to shift the conversation from “Whose model is better?” to “Whose system can actually carry enterprise demand?”
Dario Amodei’s earlier argument for compute restraint was rational:
“So when we go to buying data centers, again, the curve I’m looking at is: we’ve had a 10x a year increase every year. At the beginning of this year, we’re looking at $10 billion in annualized revenue. We have to decide how much compute to buy. It takes a year or two to actually build out the data centers, to reserve the data center.
Basically I’m saying, “In 2027, how much compute do I get?” I could assume that the revenue will continue growing 10x a year, so it’ll be $100 billion at the end of 2026 and $1 trillion at the end of 2027. Actually it would be $5 trillion dollars of compute because it would be $1 trillion a year for five years. I could buy $1 trillion of compute that starts at the end of 2027. If my revenue is not $1 trillion dollars, if it’s even $800 billion, there’s no force on earth, there’s no hedge on earth that could stop me from going bankrupt if I buy that much compute.
Even though a part of my brain wonders if it’s going to keep growing 10x, I can’t buy $1 trillion a year of compute in 2027. If I’m just off by a year in that rate of growth, or if the growth rate is 5x a year instead of 10x a year, then you go bankrupt. So you end up in a world where you’re supporting hundreds of billions, not trillions. You accept some risk that there’s so much demand that you can’t support the revenue, and you accept some risk that you got it wrong and it’s still slow.
When I talked about behaving responsibly, what I meant actually was not the absolute amount. I think it is true we’re spending somewhat less than some of the other players. It’s actually the other things, like have we been thoughtful about it or are we YOLOing and saying, “We’re going to do $100 billion here or $100 billion there”? I get the impression that some of the other companies have not written down the spreadsheet, that they don’t really understand the risks they’re taking. They’re just doing stuff because it sounds cool.
We’ve thought carefully about it.”
He basically said that if you overbuild based on heroic assumptions and the demand curve arrives a year late, you can bankrupt yourself. Correct. That is what happens when ambition meets fixed costs and loses. But the market has moved. Fast.
What looked prudent six months ago can look underbuilt today. “We thought carefully about it” is a noble line but it is less persuasive when your customers are staring at a usage cap.
A few days ago, another leaked memo from OpenAI to investors titled ‘Compute is the Ballgame’ laid out their advantage as they see it: while they are planning to have 30 GW of compute by 2030, they expect Anthropic to have ~7-8GW by end of 2027.
I am skeptical of leaks that so neatly flatter the source. When a document “escapes” and somehow reinforces exactly the narrative the company wants enterprise buyers to internalize, you are most likely looking at marketing with plausible deniability. Still, the underlying point stands.
Compute strategy has stopped being a background finance question and started becoming a customer-facing product variable. However this plays out, seems like compute will be one of the key bases of competition moving forward.



This is the part most “AI courses” completely ignore.
Everyone talks about models.
Almost no one talks about constraints.
Rate limits, latency, cost, tool switching — that’s what actually shapes real-world usage.
Which is why the advantage isn’t “best model”
It’s how well you use an ecosystem under constraints.
Google’s stack is interesting here because:
→ tightly integrated tools
→ lower switching overhead
→ more practical workflows
That’s where most of the real productivity gains are coming from.
I’ve been mapping this into actual workflows instead of theory:
https://shorturl.at/nE0Tw
https://docs.fractal-computing.com/AI_technical_brief
New ballgame. Nadella said someone at 2am will significantly reduce the need for compute and electricity and change the paradigm. First commercial customers are deployed.
Been running in the basement of the US Government for 40 years. Now can be commercialized.
The SVP of Intel will interview the inventor who built this. HIs first podcast EVER on Thursday.
Podcast will be live early next week.