On-Prem vs Cloud GPUs: The Economics Have Quietly Shifted

Hopper and Blackwell supply normalised. GPU-as-a-service margins compressed. For the first time in five years, owning silicon is a genuinely sensible default for some workloads.

13 min readBy the DataX Power team
Close-up of server hardware representing GPU racks in a private data centre

The shift nobody announced

For the first years of the GPU crunch (roughly 2022-2024), the economics of enterprise AI compute were straightforward: you could not buy H100s at anything like list price, lead times were quarters, and renting from hyperscalers – even at a premium – was the only way to ship. Every financial model assumed "cloud GPU is the baseline because there is no alternative."

That assumption stopped being correct somewhere in the middle of 2025. Hopper-class supply normalised. Blackwell shipped in volume. GPU-as-a-service pricing compressed as Lambda, Crusoe, CoreWeave, and a wave of regional providers pressured hyperscaler margins. Enterprise procurement teams started getting quotes back the same month, not the same quarter. The crunch ended, quietly, and the financial-model default that cloud GPU is always the right answer stopped being automatically true.

The calculus now, for any AI workload with steady or predictable compute demand, is worth actually running. For a meaningful share of enterprise workloads in 2026, owning GPUs – or contracting for dedicated capacity from a specialist – comes out substantially ahead.

The break-even math in round numbers

An H100 SXM5 GPU has a list price around US$25-30k; an 8-way HGX H100 system lands between US$250k and US$350k depending on configuration, networking, and support. At the same time, on-demand pricing for a comparable 8×H100 instance on AWS, Azure, or GCP sits roughly at US$30-40/hour in 2026, with 1-3 year reserved pricing around US$15-25/hour.

Run the arithmetic at 70% utilisation (achievable with proper scheduling – see our post on Kueue): 8760 hours × 0.7 × US$20/hour = roughly US$123k per year in reserved cloud cost. A US$300k owned system breaks even in roughly 24 months on compute cost alone, assuming commodity power and cooling overheads of another 30-40%. For continuous workloads, the math pencils. For bursty workloads that sit idle 80% of the time, cloud remains cheaper by a wide margin.

That crossover point – around 30-40% utilisation – is the single number most procurement decisions in 2026 hinge on. Below it, cloud. Above it, owned or dedicated. This is not radical math; it is the same math organisations have used for decades on non-AI infrastructure. It simply stopped applying during the crunch, and has quietly started applying again.

Where cloud still wins, clearly

Three workload shapes still tilt decisively toward cloud in 2026.

  • Bursty experimentation. R&D teams running occasional large jobs on H100 or B200 class hardware, with utilisation in the 5-20% range. Owning silicon for this is a liquidity sink.
  • Workloads requiring the newest silicon. If the workload only pencils on Blackwell or H200 and your organisation would not order the hardware for another year on its own cycle, cloud buys you 18 months of lead time.
  • Workloads requiring multi-region failover. Deploying owned capacity across multiple regions for disaster recovery recreates hyperscaler economics without hyperscaler discipline.

Where owned or dedicated clearly wins

Equally, a clear set of workloads now tilts toward owned capacity.

  • Predictable inference at scale. Inference workloads with steady traffic and strict latency requirements run best on dedicated hardware – predictable cost, predictable tail latency, no noisy neighbours. This is the single-largest category of workloads shifting to on-prem or dedicated capacity in 2026.
  • Sustained training campaigns. Organisations that train or fine-tune continuously – often weekly or daily – cross the utilisation threshold easily. Anthropic, OpenAI, and the other frontier labs run on owned capacity for exactly this reason; their sub-scale peers are making the same arithmetic work.
  • Data-residency-constrained workloads. The compliance simplification from keeping inference inside the perimeter is frequently worth more than the compute cost difference. See our earlier post on APAC data residency.
  • High-value fine-tuning. When a team is iterating on a proprietary model variant, the combination of data sensitivity and cost-predictability pushes the procurement answer toward owned or dedicated.

The dedicated-capacity middle ground

The interesting category in 2026 is not "cloud versus owned." It is "cloud versus dedicated cloud versus owned." A wave of specialist providers – Lambda, Crusoe, CoreWeave, Fluidstack, Genesis Cloud, Voltage Park, several APAC regionals – now offer dedicated GPU capacity at meaningfully lower prices than hyperscalers, often with 1-12 month commitments rather than 1-3 year reservations.

The pattern that works in practice: hyperscaler for the long tail of experimentation and spiky workloads; dedicated specialist for sustained training or inference at scale; owned for the fully-predictable, compliance-constrained core. The three-tier shape beats any single-provider strategy on total cost and resilience, and it is not appreciably more complex to operate if your orchestration layer (Kubernetes, SLURM, Ray) is abstracted correctly.

What an honest TCO looks like

The mistake most organisations make when running this calculation is comparing raw GPU-hour prices. The honest comparison includes more lines.

  • Compute – GPU-hour cost, amortised hardware, reserved pricing, commitment discounts.
  • Networking – egress, VPC peering, InfiniBand-equivalent fabric on owned gear.
  • Storage – high-performance training storage is surprisingly expensive on hyperscalers and often underestimated.
  • Power and cooling – typically 30-50% on top of hardware amortisation for owned gear, zero for cloud (but included in the hourly rate).
  • Operations – on-call for owned capacity is real work; for cloud it is typically the provider's problem until it is not.
  • Lead time risk – if the workload is strategically critical, capacity risk has a real cost. Cloud usually wins on this axis; specialists with long-term contracts have narrowed the gap.

The procurement playbook for 2026

A concrete sequence that lands well in most enterprise environments: segment your workload portfolio into bursty, sustained, and compliance-constrained buckets; price each against all three tiers (hyperscaler on-demand, dedicated specialist with commitment, owned capacity) with honest TCO; and then run two procurement RFIs – one for specialist dedicated capacity, one for owned hardware – in parallel with your renewed hyperscaler pricing conversation.

Organisations that only negotiate with their incumbent hyperscaler will not learn how much the market has moved. The specialists know they are hungry; the hyperscalers know that too. The negotiating leverage that existed for cloud providers during the crunch has materially softened, and most enterprise buyers have not yet adjusted their expectations. The gap between what a well-run 2026 AI compute procurement looks like and what most organisations are signing is the largest single opportunity on their AI infrastructure P&L this year.

Let's build what's next

Share your challenge – AI, data, or infrastructure. We'll scope your project and put the right team on it.