← Back to Insights

Insight

The Vendor Buys The Tokens

Ariel Agor

Listen · Read by Leo · click any word to jump

0:00 / · loading…

On April 14, 2026, HubSpot rewrote the price tag on its Breeze Customer Agent. The conversation rate dropped from one dollar to fifty cents. The headline read like a discount. The change underneath was bigger. Pricing moved from per conversation to per resolved conversation. If the agent does not solve the customer's problem inside seventy-two hours, with no human jumping in, HubSpot does not collect.

Three weeks later, on May 11, 2026, Zendesk started rolling out its own outcome-based packaging at one dollar fifty per automated resolution on committed volume, two dollars on pay-as-you-go. Intercom Fin already runs at ninety-nine cents per resolution. Salesforce Agentforce bills its credits along the same line. The whole layer of the SaaS stack that lives on top of customer support has moved, in a single quarter, from selling access to selling results.

This is not a discount. This is a transfer of risk.

The economics of AI agents stopped being a subscription problem

For two decades, B2B software companies sold predictability. You bought a seat. The seat cost the same in January as it cost in November. The vendor knew its revenue inside a percent. It built a forecast, hired against the forecast, and walked into a board meeting confident.

The economics of AI agents do not work like that, and the vendors selling them know it. An agent does not consume a seat. It consumes tokens. Tokens cost different amounts depending on which model the agent calls, how many retries it needs, how big the context window has grown, whether the agent decides to think for fourteen seconds or for fourteen minutes. Goldman Sachs published a note this spring projecting token demand will multiply twenty-four times by 2030.

When you bill by seat against a variable cost base, you eventually lose. So the vendors moved.

How much do agents actually cost to run? Microsoft answered that question by accident. Fortune reported on May 22 that the company started cancelling most of its direct Claude Code licenses and moving engineers to GitHub Copilot CLI by June 30. The internal data driving the cancellation: a senior team's tokens were running well above the salaries of the people consuming them. Nvidia's VP of applied deep learning said it plainly. The cost of compute for his team is far beyond the cost of the employees.

Uber put a number on the same problem in April. Its CTO confirmed the company had burned through its entire 2026 AI coding tool budget in four months. The company had spent the prior year encouraging that exact behavior with internal leaderboards ranking teams by token consumption.

Meta runs the same scoreboard under a different name. A program called Claudeonomics tracked usage; the company logged more than sixty trillion tokens in a single thirty-day window. Peter Steinberger, the OpenClaw creator now at OpenAI, said his team of three burned through more than 1.3 million dollars in tokens in one month. A separate report on May 29 described an enterprise customer spending 500 million dollars on Claude usage in a single billing cycle.

Tokens are the bill. The bill is enormous. And the vendor just agreed to swallow whatever portion of that bill goes to a conversation the customer decides was not resolved.

Why a vendor would take that bet

The obvious question is why HubSpot, Zendesk, Intercom, and the others would willingly move from a predictable revenue model to a variance-heavy one. The answer is that the alternative was worse.

Buyers had started doing the math. A team of fifty support reps paying eighty dollars a seat per month is forty thousand dollars. The same team augmented with an AI agent at one dollar per conversation, handling twenty thousand monthly tickets, was a twenty thousand dollar swing in the other direction with no clear evidence the agent was earning it. So buyers were running pilots, getting confused by the bill, and refusing to convert. The pilot penalty was real and it was strangling growth.

Outcome-based pricing fixed the buyer's problem. It did not fix the vendor's problem. It moved the vendor's problem inside the contract.

A vendor selling fifty cents per resolved conversation now lives by three numbers. How many input and output tokens does the average resolution take. What is the cost per million tokens of the model that runs each step. What percent of attempted conversations end in resolution rather than handoff. A bad day on any of those numbers turns the unit economics underwater.

Deloitte's CFO guide to token economics, published this spring, put a benchmark on the gross margin band. Below ten percent of revenue spent on inference is the territory of healthy agentic features. Above twenty-five percent is the territory of features funding themselves out of margin. The implication is that vendors selling outcomes have to be inference-cost engineers first and product companies second.

The new operating discipline

Inference cost is not constant. It moves with model choice, prompt design, retrieval architecture, retry policy, context compression, and tool-call discipline. None of these are marketing problems. All of them are engineering problems that compound directly into gross margin.

A study cited in the Deloitte report found that organizations routing every workload to frontier models paid 18 dollars 40 per million tokens. Organizations running tiered architectures, calling cheap models for simple steps and expensive models only when needed, hit a blended cost of 2 dollars 31 per million tokens. The gap is roughly eight to one. The same agent doing the same work, costing eight times more, depending solely on how the team architected the routing layer.

When you sell seats, a router that saves seven dollars per million tokens is a nice optimization. When you sell resolutions at fifty cents each, that router is the entire margin.

This is why outcome-based pricing is a structural shift, not a marketing tweak. It pushes every layer of the AI agent stack into operational discipline. The companies that win at this look more like cloud infrastructure operators than they look like SaaS vendors. They run capacity planning. They monitor latency tails. They split traffic across model providers. They negotiate volume discounts directly with foundation labs. They build private inference clusters when the math says ownership beats rental. The Microsoft and OpenAI relationship is the public version of the same negotiation every serious agent vendor is now having with its model supplier.

Agentic AI consumes up to a thousand times more tokens than a single LLM query, by Tom's Hardware reporting on the tokenmaxxing problem. The reason is structural. Agents plan, read, write, check, retry, call external tools, and may work for minutes or hours on a task, with input tokens, output tokens, context, cache, and reasoning accumulating at each step. Every step is a bet the vendor placed against the resolution outcome it promised at signing.

What the buyer just bought

The buyer side of outcome-based pricing is less obvious and more important. Buyers think they bought predictability. They did not. They bought a different kind of variance.

Under per-seat pricing, the buyer's monthly bill was flat and the value was variable. Some seats were heavy users, some were light, but the cost did not move. Under per-resolution pricing, the cost is the value. If support tickets surge during a product outage, the bill surges with them. If the team rolls out a new feature that triggers ten thousand confused customer questions, the bill arrives the same week. The variance went somewhere; it went onto the same line item it used to anchor.

This has two consequences buyers are now figuring out in real time.

First, finance teams that priced AI agents into their 2026 budget on a per-conversation basis are getting different numbers than they expected. The elvex enterprise AI budget control report from May found that only fifteen percent of enterprises can forecast AI costs within plus or minus ten percent accuracy. Almost a quarter of companies miss their forecast by more than fifty percent. Outcome pricing does not solve forecast error. It moves the source of the error from token counting to demand prediction. Most finance teams are worse at demand prediction than they were at counting tokens.

Second, every dispute about whether an interaction was resolved is now a billing dispute. The Zendesk and Intercom resolution definitions are not the same as the HubSpot definition. The buyer who runs three platforms is running three different resolution accounting systems and reconciling them by hand. CMSWire flagged this exact concern when the HubSpot rollout went live. The economics of AI agents now include arbitrage between vendor definitions of success.

Where this leaves the strategy

For the executive making a decision in May 2026, the outcome-based shift creates four questions worth answering before signing anything.

The first is whether the vendor you are buying from has the engineering depth to survive its own pricing model. Vendors who moved to outcome pricing without rebuilding their inference architecture are running quiet losses on every resolution. They will either raise prices, reintroduce per-seat floors, or fail. Pick the vendor whose CTO has talked publicly about routing, caching, and model selection. Avoid the vendor whose pricing announcement was written by the CFO without engineering buy-in.

The second is whether your own workflows generate the kind of clean agent traffic that resolves cheaply. A customer base that asks the same fifty questions over and over is a profitable customer base for the vendor and a stable bill for the buyer. A customer base that asks novel, ambiguous, multi-step questions generates retries, escalations, and runaway token consumption. The vendor will charge for that one way or another. If your product is complex, the bill will reflect it.

The third is whether you are willing to be measured the way your vendor now measures itself. Outcome pricing changes what the agent optimizes for. It optimizes for resolution as the vendor has defined it, not necessarily resolution as your customer experienced it. An agent that closes a conversation by saying please contact support directly might still count as resolved under some definitions. Read the contract. Read it twice.

The fourth is the build versus buy question, recast. Per-seat pricing made buying easy because the math was readable. Outcome pricing makes buying harder because the math depends on traffic patterns you do not fully control. For a company with significant interaction volume, the math now favors owning the agent layer at least partially. The companies treating their agent stack as core infrastructure, with their own retrieval, their own routing, their own evaluation, end up with cost curves the vendor cannot match.

The deeper move underneath the price tag

Underneath the pricing change is a deeper restructuring of what software is for. For three decades, business software sold itself as a tool that humans operated. The unit of value was the human user. The seat was the proxy. The pricing was the contract.

Agents collapse that frame. The agent operates the tool, often without a human in the loop. There is no seat to charge for because there is no seat in the conventional sense. The vendor and the buyer both know this. Outcome pricing is the first honest attempt at a contract that fits the new shape. It will not be the last. The next move, already starting in usage-tier hybrids and committed-volume discounts, is toward contracts that look more like manufacturing supply agreements than like SaaS subscriptions. Capacity. Throughput. Service levels. Penalties.

The economics of AI agents has stopped being a software conversation. It is becoming a capacity planning conversation. Companies that have never thought about themselves as having a capacity planning function are about to discover they need one.

What to do about it before quarter close

A practical, near-term checklist for any operator approaching a renewal or a new agent contract in the next ninety days.

Audit your existing AI agent contracts. Identify which ones are still per-seat and which have shifted. The ones that shifted likely changed terms beyond the price line. Read the resolution definition.

Forecast agent-driven demand the way you forecast warehouse demand. Seasonality, product release cycles, marketing pushes, customer cohort age. Outcome-priced agents are demand-sensitive in ways seat-priced software never was.

Build a small internal team that owns the routing layer between your business and your AI vendors. Even one engineer who understands token economics and model selection is worth ten times the consultants explaining gross margin. The companies winning this are insourcing the inference architecture, even when they are still buying the agent product on top.

Stop pretending the bill will shrink as foundation models get cheaper. Token unit cost is falling roughly sixty to seventy percent per year on inference. Token consumption per task is growing faster than that. The net is up and to the right. Plan accordingly.

The architect's choice

There is a path through this that does not require you to choose between paying the vendor for outcomes you cannot predict and building everything from scratch. The path is architectural. You decide which parts of the agent stack you own, which parts you rent, and which parts you compose. You build the connective tissue. You buy the components that someone else can make commodity. You evaluate the result against your own definition of success, not the vendor's.

This is what we mean when we say AI is an architecture problem. The price tag is not the problem. The shape of the deal is the problem. The shape of the deal is downstream of how you decided to compose the stack. If you let the vendor design the composition, the vendor wins the margin. If you design the composition, you keep the margin.

The companies who treat the May 2026 pricing shift as a procurement event will spend the next year arguing with billing reports. The companies who treat it as the structural signal it is will spend the next year building the composition layer that lets them choose vendors, swap models, audit resolutions, and operate inference cost as a first-class business metric. There is no off-the-shelf product that does this for you, and there will not be one. The architecture is the work.

At Agor AI Advisory, we build that architecture with you. We start from your traffic, your unit economics, and the contracts on your desk this quarter. We design the routing, evaluation, and ownership boundaries that let your agents earn their keep without surrendering the margin to a vendor whose interests are not yours. The pricing page will keep moving. The architecture decisions you make now will determine whether the next move costs you a quarter or wins you a decade.

Schedule a strategic consultation with us today.

Sources