Enterprises are spending billions on AI infrastructure, and new data shows 95 cents of every dollar spent on GPU compute is going to waste. Here's what's changed, and what you need to do about it.
The GPU Utilization Catastrophe No One Is Talking About
Cast AI's 2026 State of Kubernetes Optimization Report landed this week with a number that should alarm every executive signing off on AI infrastructure budgets: average GPU utilization on Kubernetes clusters sits at just 5%.
Read that again. Five percent.
While cloud providers are raising GPU prices, AWS increased H200 Capacity Block pricing by 15% in January 2026, enterprises are leaving 95% of that increasingly expensive compute sitting idle. Global AI infrastructure spending hit $82 billion in a single quarter in 2025. If utilization stays at 5%, organizations are collectively burning through roughly $78 billion per quarter on idle silicon.
This is not a technology problem. It is an orchestration problem, and Kubernetes v1.36, released this month, is finally providing the tools to fix it.
GPU Utilization: Before vs. After Kubernetes 1.36 DRA — Source: Cast AI 2026 State of Kubernetes Optimization Report
What Kubernetes 1.36 Changes for AI Workloads
The Kubernetes project released version 1.36 (codename Haru) with 70 enhancements, and the most consequential changes are squarely aimed at AI and GPU workloads.
Dynamic Resource Allocation graduates to production ready. For years, Kubernetes treated GPUs as blunt instruments, you either had the whole GPU or you didn't. Dynamic Resource Allocation (DRA) allows the scheduler to understand the specific requirements of a GPU or AI accelerator at a granular level, enabling GPU sharing across multiple workloads, fractional GPU allocation, and multi-node AI job coordination that previously required custom operators and significant engineering overhead.
NVIDIA formalized this direction at KubeCon EU 2026 by donating its Dynamic Resource Allocation driver for GPUs to the Kubernetes community. This is a significant signal: the largest GPU vendor in the world is now treating native Kubernetes GPU scheduling as the standard path, not a workaround.
Security defaults tighten. User Namespaces and Mutating Admission Policies now reach General Availability. For enterprises running AI workloads, this matters because AI jobs often require elevated privileges that create security exposure. User Namespaces maps container root to an unprivileged host user, eliminating a major attack surface without requiring application changes.
Production case studies from CNCF member organizations show that advanced GPU scheduling on Kubernetes can push utilization from 13% to 37% — and some implementations are exceeding 80%. At enterprise GPU spending levels, the difference between 5% and 40% utilization is measured in millions of dollars annually.
The Multi-Cloud Layer Is Shifting Too
GPU optimization is not the only tectonic shift this week. AWS previewed a new cross-cloud connectivity service with Google Cloud as its first launch partner, signaling that the era of hyperscaler walled gardens is cracking open.
Google Cloud Next 2026 unveiled cross-cloud caching, storing cross-cloud data on first read to reduce egress fees and accelerate queries across AWS and Azure datasets, alongside a Cross cloud Lakehouse built on Apache Iceberg that lets AI agents access data regardless of where it lives.
The business implication: multi cloud is no longer purely an IT operations headache to be managed around. The hyperscalers themselves are building interoperability into their platforms. Organizations still locked to single cloud architectures by technical debt are losing negotiating leverage as the switching cost drops.
Meanwhile, the inference economics are clarifying. Inference now accounts for roughly 2/3 of all AI compute in production environments, and industry data shows inference can represent 80–90% of the lifetime cost of a production AI system. The organizations winning on AI economics are the ones optimizing inference infrastructure, and that means getting Kubernetes GPU scheduling right.
Multi-cloud connectivity: AWS and Google Cloud cross-cloud preview, GCP Cross-cloud Lakehouse — all orchestrated via Kubernetes
What Executives Should Do Now
- Audit your GPU utilization immediately. If you don't have a number, assume it's near 5%. Run Cast AI, KubeGPU, or a comparable tool against your clusters this week. The data will be uncomfortable and actionable.
- Evaluate upgrading to Kubernetes 1.36. The DRA enhancements alone justify the upgrade cycle for any organization running GPU workloads. Schedule the migration before Q3 AI budget reviews.
- Revisit your GPU procurement strategy. At 5% utilization, buying more GPUs is almost certainly the wrong answer. Right-sizing and scheduling optimization should come first.
- Build a multi cloud data strategy. The AWS to Google connectivity preview and Google's cross cloud lakehouse are early signals of where the market is heading. Organizations without a multi cloud data architecture are building technical debt today.
- Separate training from inference infrastructure. Training and inference have fundamentally different resource profiles. Kubernetes can now manage both natively, but only if your cluster topology and scheduling policies are designed for the distinction.
Where ITSulu Fits
The gap between "we have a Kubernetes cluster" and "we have a Kubernetes cluster optimized for AI workloads" is where most enterprise value is being lost right now. ITSulu's Automated Kubernetes Operations practice works directly with infrastructure and platform engineering teams to close that gap — auditing current GPU utilization, implementing DRA based scheduling, hardening security posture to the new 1.36 defaults, and designing multi cloud connectivity strategies aligned with where AWS and Google are taking the ecosystem. If your organization is heading into H2 budget cycles with AI infrastructure spend that isn't performing, that conversation is worth having before the numbers get locked.
The Bottom Line
Kubernetes 1.36 and the multi-cloud moves from AWS and Google this week represent a genuine inflection point for enterprise AI infrastructure. The organizations that act on GPU utilization data now — before the next budget cycle — will have a measurable cost advantage over those that wait for the technology to mature further. It already has.