The Agentic Stack Wars: Part Three - EXTRACTION

Your AI budget is already wrong. What Google I/O 2026 means for every organization that treats Agentic AI as a software subscription

Jun 04, 2026

This is Part 3 of a four-part series that I will be publishing over the course of a week. It provides advice about how the agentic stack is being managed and positioned by the major vendor players in the Agentic Stack Wars. Please keep watching your inbox for the final part, or start the series from the beginning with Part 1: CONFESSION

The Information reported in April that Uber had burned through its entire 2026 AI coding budget in four months. CTO Praveen Neppalli Naga’s response was that the company was:

"back to the drawing board."

That is not a story about a technology company being reckless. Uber runs sophisticated engineering and finance functions. It is a story about what happens when a metered utility gets budgeted like a flat-rate tool, and about how quickly that gap becomes visible once agentic workflows move from pilot to production.

The gap is about to become much more visible to many more organizations.

If you do one thing today, please share this article with a friend

What actually changed at Google I/O

The consumer pricing announcement at Google I/O last week was covered mostly as a product story. New tiers. New AI agents. New bundles. But underneath the product, language was a structural shift that mattered more to organizational buyers than to individual subscribers.

Google announced that Gemini would move to compute-based billing. Not messages. Not seats. Compute consumed. The complexity of the prompt, the length of the conversation, and the features engaged. All of it now factors into how fast you burn through your allocation. When you hit the ceiling, you either stop, downgrade to a lighter model, or buy more.

OpenAI and Anthropic have been moving to the same place through different mechanisms. Anthropic’s Enterprise plan already separates the seat fee from usage — access is the seat, consumption is billed separately at API rates, and administrators can set spend limits and monitor overages. OpenAI’s Business and Enterprise plans use shared credit pools for advanced features, with configurable overage controls when the pool runs out of credits.

The direction of travel is the same across all three. Flat-rate access at the organizational level is a thing of the past. What replaces it is a hybrid model: a contract for access, metered billing for consumption, and contract language to manage the gap between the two.

Procurement teams that have not yet built a robust AI cost governance framework are already behind.

The ROI reality makes this harder

The timing of this commercial shift is uncomfortable because it is arriving before the ROI case is settled.

McKinsey’s 2025 global survey found that 88 percent of organizations reported regular use of AI in at least one business function. Only 39 percent reported any EBIT impact at the enterprise level. In the U.S., specifically for CxO data, only 23 percent of executives reported AI delivering favorable cost changes. For nearly half of the respondents, the modal experience was cost increases before any cost reductions became visible.

IBM’s findings were even sharper: only 25 percent of AI initiatives delivered expected ROI, and only 16 percent scaled enterprise-wide.

This is not an argument that AI does not work. It is an argument that value capture is uneven and harder than the vendor marketing suggests. Organizations are being asked to pay more, with more complexity, at the precise moment that most of them are still working out whether the investment is delivering.

In July 2025, I published a cost analysis projection of a multimodal AI orchestration system for clinical reasoning. The core premise I made at the time was straightforward: a single orchestrated multi-model query costs 5 to 10 times more than a standard RAG single-model approach, and a 3,000-clinician health system running at scale would be looking at six to seven figures in annual inference costs before any licensing, support, or operational overhead is factored in.

That article was written even before the price inflation we are seeing in token consumption for the latest rounds of commercial models. My note at the bottom of that article was direct: extrapolate that across multiple use cases and workflows, and the budget impact becomes very real, very fast.

That was just ten months ago, in a specific healthcare use case context. Gartner's more recent analysis confirms the industry-wide direction: agentic AI workflows consume 5 to 30 times as many tokens per task as a standard generative chat.

The vendors’ current pricing architecture is not a surprise. It is the commercial consequence of economics, already visible to anyone familiar with the stack and running the numbers.

Three organizations, three versions of the same problem

Enterprise

The seat-plus-consumption model is now the standard architecture at scale. The practical consequence is that AI has shifted from a software budget line item to a hybrid of software licensing and cloud infrastructure spending. Finance teams that are not already tracking token consumption and overage patterns alongside seat counts are flying partially blind.

Benchmarkit’s 2025 survey of 372 enterprise organizations found that only 15 percent of companies could forecast AI costs within 10 percent of actual spend. Nearly one in four missed by more than 50 percent. The companies that closed that gap fastest had treated AI cost management the same way they treat cloud cost management — with dedicated tooling, usage reporting, and regular architecture reviews.

Public sector

The GAO’s 2026 review of AI adoption across federal agencies reported that officials consistently said it was challenging to understand AI costs because vendor pricing models were evolving too rapidly to budget against reliably. The Army's XM-30 program received a proposal quoting AI licensing fees of roughly $300,000 per vehicle, per year. Against the program's eventual planned fleet of around 3,800 vehicles, that is north of a billion dollars annually in licensing alone, before a single vehicle is acquired.

Officials on Army Project Linchpin noted that buyers routinely underestimate enterprise costs by focusing on model capability and missing the infrastructure required to operate AI over time.

The VA provides the most direct ROI case study: officials retired the SoKAT suicide-prevention solution after determining it did not improve enough on existing approaches to justify the ongoing expense. Even mission-critical, well-intentioned deployments fail the value-for-money test once full operational costs are surfaced.

Procurement cycles that were not designed for software that reprices quarterly are struggling to keep up. The UK government’s AI procurement guidance now explicitly requires departments to build in lifecycle cost analysis, assess vendor lock-in risk, and account for hidden costs. That is the right framing. It arrived about two years after it was needed.

Nonprofits and NGOs

The Nonprofit Finance Fund’s 2025 survey found that 36 percent of nonprofits ended 2024 with an operating deficit, 52 percent had three months or less of cash on hand, and 86 percent were feeling inflation-driven cost pressure. In that environment, an AI subscription that morphs from a predictable $20 or $40 line item into an overage-generating metered service does not just create a budget problem. It creates a governance problem, a program trade-off problem, and sometimes a procurement halt.

TechSoup’s data shows that nearly 30 percent of smaller nonprofits cite financial constraints as their primary barrier to AI adoption, and more than 75 percent have no formal AI strategy. For organizations in that position, the commercial shift toward compute metering is not an inconvenience they can bury in operating costs. It is the difference between a tooling they can use and fits the budget, or no Agentic AI at all.

Four things that matter immediately:

The right frame is not “how do we keep costs down.” It is “how do we build governance that matches the billing model we are now actually operating under.”

Audit actual usage against assumed usage.
Most organizations signed AI contracts based on seat counts and usage estimates that predate agentic workflows. The consumption profile of an organization running background agents, long-context document processing, and automated code generation is fundamentally different from one doing chat. If you have not reconciled the two recently, do it now.
Separate access cost from consumption cost in your budget model.
The seat fee is a known fixed cost. The consumption cost is variable and depends on how your teams work. They need separate budget lines, separate governance, and separate alert thresholds.
Build spend controls before you need them.
Anthropic, OpenAI, and Google all provide admin tooling for spend limits and overage controls. Organizations that configure these before scaling are significantly less exposed than those that discover the need after the first unexpected invoice.
Frame this as utility infrastructure, not software, in your internal stakeholder conversations.
The organizations that have most effectively absorbed cloud cost management did so when they stopped treating it as a procurement problem and started treating it as a joint engineering and finance problem. AI spend is following the same pattern. The teams that get ahead of it will be the ones that recognize it early.

The consumer version of this shift is a pricing inconvenience. The organizational version is a fiscal and governance gap.

Buy me a coffee

Summary

That gap won’t close itself. But it is not mysterious either, and the first move is to stop measuring the wrong thing.

A token is a hopeless unit. The same job, run twice on the same input, can cost five or ten times as much depending on how often it is retried and how much new is dragged into its context between workflow cycles. A rising bill might mean real work is getting done, or it might mean the machine is thrashing, and the two look identical on the invoice.

So stop counting tokens and start counting finished work. Cost accounting is hard, tedious work, but you need to get there. What does it cost to close one ticket, settle one claim, and review one contract?

Then set that against the number you already trust. What your own internal cost accounting says OR what an outsourcer would charge for the same job. You will be inundated with offers to parcel up your workflow and hand it off to companies claiming to already be well-tooled up with AI efficiency in those areas, and that Business Process Outsourcing path may be a viable operating path to take.

Measure it that way, and the leaks show themselves: jobs that quietly retry until they pass, and context stuffed with fifty documents when five would do. And the one nobody owns up to — easy work sent to an expensive model out of laziness or plain habit.

None of it is exotic. In fact, it is boring, solid, good stewardship.

It is cloud cost management wearing a new hat, and the organizations that absorbed that discipline a decade ago already own the muscle and the playbooks. The ones treating a metered utility like a flat subscription are the ones who end up explaining a budget miss to the board.

And for my vendor friends out there. It cuts the same way from the other side of the table. If you are the one selling solutions into a hospital, know your cost-to-serve per workflow down cold before the customer’s CFO works it out for you, because the margins in healthcare will not carry a tool that costs more than it saves — and tools like that get quietly retired, as Olive Health reminded everyone.

Keep haverin’.

The Agentic Stack Wars — the full series:

Part One — Confession: Google (Finally) Just Said The Quiet Part Out Loud

Part Two — Architecture: Same Stack, Different Hoodie

Part Three — Extraction: Your AI Budget Is Already Wrong (you are here)

Part Four — Reckoning: The AI Free Lunch Was Always a Fairy Tale

Discussion about this post

Ready for more?