The metric that matters in your AI strategy this quarter is not accuracy. It is not latency. It is not even cost per token in isolation. It is a number that almost no boardroom tracked twelve months ago and almost every infrastructure executive now mentions by name: tokens per watt per dollar.
The new, top-of-mind metric discussed in industry circles is "tokens per watt per dollar." This new focus means it is no longer about simply using less energy, but about using energy as efficiently as possible. That sentence, from a Schneider Electric and LiquidStack roundtable published last month by Data Center Knowledge, is the quiet repricing of the entire AI economy. Read it twice. Then look at your own AI budget and ask whether anyone in your company can answer the question it implies.
Most cannot. Most CFOs are still tracking AI spend as a SaaS line item: dollars per seat, dollars per call, dollars per million tokens. That framing was workable in 2024. In May 2026 it hides the variable that now decides who wins.
The shift no one announced
Look at what the hyperscalers did in the last 90 days, then ask yourself what they know that your strategy deck does not.
On 26 February 2026, Meta signed a multibillion-dollar deal with Google to gain access to its Tensor Processing Units (TPUs), marking a significant shift in its AI infrastructure strategy away from single supplier dependency on Nvidia. Meta's projected capital expenditure on AI infrastructure for 2026 is between USD 115b and USD 135b, nearly double the USD 72b spent the previous year. Then on May 3 and 4, two more pieces landed. Meta signed a multi-billion dollar deal to rent Google's AI chips, a move driven by the global memory shortage that has increased the cost of AI capital expenditure. Meta's AI capex guidance for 2026 has been raised to a range of $125 billion to $145 billion, reflecting the escalating costs of building the "Superintelligence Labs" required to keep pace with Google and OpenAI.
A hundred and forty five billion dollars. One company. One year. For physical things made of copper, silicon, and concrete sitting next to power lines.
Anthropic was busy too. OpenAI raised $122B at $852B, anchored by Amazon, Nvidia, SoftBank, and Microsoft. Anthropic took an additional $40B from Google and $5B from Amazon (packaged with $100B of AWS spend), and signed chip deals with Google and Broadcom reportedly worth hundreds of billions.
The labs raised hundreds of billions of dollars in a single quarter, and they did not raise it for research. They raised it for substations.
Why the bottleneck moved
For the last three years, the limiting factor on AI capability was the model. Bigger model, better model, smarter agent. That bottleneck broke. Chinese labs released open-weights models that landed at roughly the same capability ceiling on agentic engineering at meaningfully lower inference cost than the Western frontier. None costs more than a third of Claude Opus 4.7. The Western incumbents responded by shipping faster. OpenAI shipped GPT-5.5 on April 23, 2026, only six weeks after GPT-5.4. Anthropic has been equally aggressive, shipping four major Claude updates in roughly 50 days during early 2026.
When five labs ship near-frontier models every six weeks, the model stops being the moat. Compute does. And compute, at the scale these companies operate, is electricity in a building.
The defining risk for AI data center expansion has shifted from computational efficiency to the physical availability of grid-scale power, creating a primary bottleneck for commercial growth. The sheer density of AI workloads now presents a systemic challenge to regional electricity grids that were not designed for such concentrated, high-magnitude loads. This has transformed site selection from a function of latency and fiber access to a critical search for available megawatts.
Read that as a strategy memo, not an engineering note. The site selection criterion for the most valuable real estate in capitalism just changed. Fiber used to matter. Now it is power. Power constraints were extending data center construction timelines by 24 to 72 months. Six years. A construction queue measured in the lifetime of a typical startup.
And the queue is real. Grid connections take even longer. In the mid-Atlantic and Midwest region that PJM Interconnection serves, projects approved in 2025 had been in the queue for 8 years.
What this means for your P&L
Most operators read these numbers and conclude they are spectator sport. Meta's capex is not their problem. They are wrong, and the mechanism is simple.
When the marginal supply of compute is rationed by physical electricity, the price of inference does not fall on the smooth curve everyone assumed. It steps. It steps when a substation comes online in Phoenix. It steps when a behind-the-meter gas turbine clears permitting in Texas. It steps when Anthropic signs a chip deal that pulls 8 gigawatts off the available pool for the next four years. Between those steps, prices can rise. They are rising right now, and the bill is being passed to anyone running production agentic workloads.
This is why if 2025 was the year of the computer-use agent, 2026 will be the year of computer-use agent training, and training requires verifiers. Verifiers run continuously. They are not the once-a-day chatbot call you budgeted for. They are the background hum of every agent your company runs, every minute of every day. The token bill goes from a SaaS line to a utility line, and utility lines respond to grid pricing.
If your AI cost model assumes the API price you saw in January, you are already wrong.
The two paths the hyperscalers chose
Watch what they did, because their playbook is now the playbook for any company with a non-trivial agent footprint.
Path one: secure power directly. First, they are shifting multi-billion dollar investments to power-rich regions, such as Microsoft's $15.2 billion commitment in the UAE and Meta's $10 billion campus in Louisiana. Second, they are forging direct energy procurement partnerships, like Microsoft's Power Purchase Agreement (PPA) for 150 MW of dedicated wind power, to secure their own energy supply and bypass grid limitations.
Path two: bring your own power. Hyperscalers know they must bring their own power (BYOP), often by placing those generating assets off the grid right where data centers are built ("behind the meter" or "BTM"). Most facilities will still need a grid connection for backup, but that's not the same as depending on the grid for power 24/7.
The vocabulary is new. BYOP. BTM. Power purchase agreement. Capacity auction. These are not phrases that belong in an AI strategy document, except now they do, because they decide what you can run and what you cannot.
Notice what neither path involves: a better model. The companies with the best models are spending the most money on the thing that is not the model. That is the tell.
The "tokens per watt per dollar" mental model
Here is how to use the metric in practice.
Take any workload you currently run on a frontier API. Call it a customer support agent, a research summarizer, a code reviewer. Three numbers describe its economics:
Tokens consumed per task. Watts drawn at the data center to produce those tokens. Dollars billed to you per million tokens.
The first number you can measure. The second is what your vendor is paying, not what you are paying directly. The third is your invoice. The vendor's margin is the spread between two and three, multiplied by the cost of electricity at their site.
When electricity prices rise, that margin compresses. The vendor has three options. Raise prices. Cut quality (smaller model, fewer thinking tokens, lower accuracy). Or run the workload at a different site with cheaper power. Elon Musk's own AI company, xAI, launched Grok 4.3 at an "aggressively low price". The new model features a powerful voice cloning suite and a specialized "Imagine" agent mode for creative projects, representing a calculated bet that the market wants specialized, cost-efficient brilliance over balanced generalists.
That phrase, "aggressively low price", is a leading indicator. xAI is signalling that they have solved the power equation, at least temporarily. The companies that have not solved it are quietly walking prices up by switching their default routing to smaller models. You may have noticed your agents got dumber in March. That was not a regression. That was your vendor managing tokens per watt per dollar on your behalf, without telling you.
The Zhipu signal
The clearest data point in the last 90 days is also the most undercovered. Zhipu AI released GLM-4.7, a model trained entirely on Huawei Ascend silicon with a 1.2% hallucination rate, the lowest reported by any frontier lab. It costs $0.11 per million input tokens compared to Claude Opus at $15.
A hundred and thirty times cheaper. Trained on chips the West cannot buy. Hallucination rate lower than the frontier. If you read that and your first thought was "must be inferior on benchmarks", you missed the point. The point is that the cost curve of intelligence is not what the Western infrastructure spend assumes. There is a path to good-enough reasoning that does not require $145 billion of capex. China found it. Your competitors will find it next.
For a Western operator, the implication is uncomfortable. Locking yourself into a single frontier vendor at $15 per million tokens, on the bet that prices fall, is a bet against a country that has already shown the prices can fall by two orders of magnitude. The vendor knows this. It is highly likely that Google's TPU commercialisation will accelerate, as its deal with Meta will encourage other companies such as Microsoft and Amazon to form partnerships. Everyone is diversifying their compute supply. Everyone except you.
What an operator should actually do
This is where most consulting decks default to "evaluate your AI stack" and produce a slide with five workstreams. Skip that. Do these things instead.
First, measure your agent fleet by power, not by tokens. Ask each vendor what their data center PUE is, where the facility sits, and what their long-term PPA looks like. Vendors that cannot answer are vendors that will raise prices in the next twelve months. Vendors with locked-in renewable PPAs and behind-the-meter generation will hold. You want to know which is which before your CFO finds out from the invoice.
Second, write contracts with model portability built in. For startups, the practical takeaway is this: the model you picked three months ago may already be outdated. Build your product stack to swap models without rebuilding everything. API-first architecture is no longer optional. If your code calls one model directly, you have a hostage situation, not a vendor relationship. Route through an abstraction. Test against three families weekly. When prices step, you step with them.
Third, treat your inference workload as a power footprint. Some workloads can be batched and run at off-peak grid hours when wholesale electricity is cheap. Some must run in real time. The split is a strategic decision, not an engineering one. Demand peaks for only a few hours per year. The rest of the time, there is plenty of capacity available. So if data centers can manage their demands on the grid to avoid peaks, they can access plenty of power without increasing peak demand at all. Your agents can do the same thing. Most operators have never thought to ask.
Fourth, watch the regulatory current. Google, Microsoft and xAI will share unreleased versions of their AI models with the government to curb cybersecurity threats, the National Institute of Standards and Technology announced on Tuesday. That announcement, dated May 5, 2026, is the first step toward a formal pre-release review regime. Pre-release review means slower deployment cycles for new models. Slower deployment cycles mean the model you have today is the model you will have for longer than you planned. The strategic value of being able to switch providers just went up.
The wrong question and the right one
The wrong question is: "Which model should we standardize on?"
That question made sense when models were the moat. They are no longer the moat. Five labs ship comparable frontier models on a six-week cycle. Open-weights Chinese models match them at a third of the cost. The model layer is commoditizing in real time.
The right question is: "What is our cost of intelligence per unit of business outcome, and which physical and contractual structures protect that cost from the next three steps in the power market?"
That question forces you to look at things AI strategy decks rarely look at. Substation queues in the region your vendor operates. PPA expiration dates. The capacity auction price at PJM Interconnection. Costs in one large region in the mid-Atlantic and Midwest, which is supplied by the regional transmission organization PJM Interconnection, rose from ~$60/kWh in 2024 to more than $300/kWh in 2025. A five-fold rise. That cost flows through to inference prices on a lag of months, not years. If you have agentic workloads that became economic at $15 per million tokens, ask what happens to your unit economics at $25. Or $40.
The companies that survive the next phase are the ones that built their AI stack with the physical layer in mind. The ones that get crushed are the ones that wrote AI strategy as if it were a software purchase.
The architecture question
This is why architecture beats procurement, and why the difference matters now more than it did six months ago.
Procurement says: pick the best vendor, sign the contract, run the workload. It treats AI as a service. It optimizes the per-call price. It cannot answer the tokens per watt per dollar question because it does not see the watts.
Architecture says: design a system where the model layer, the routing layer, the data layer, and the contract layer can all flex when the underlying physical economics shift. It treats AI as infrastructure. It optimizes for resilience to price steps. It assumes the cost curve is not smooth, because the cost curve is now governed by substations and turbines, and those do not deliver capacity smoothly.
The companies that designed AI architectures in 2024 around a single frontier vendor are now spending engineering quarters retrofitting portability. The companies that designed for portability from day one are picking up market share while their competitors are renegotiating contracts.
A year from now, this gap will be obvious. Today, it is still invisible to most boards. That is the window.
The conclusion that matters
Buying a tool does not solve this. Buying a tool from any vendor, no matter how good the vendor, ties your cost of intelligence to a single physical supply chain that you do not control and cannot see. Every quarter, more of your operating cost gets routed through electricity prices in a region you may not have heard of, run by a utility you cannot name, governed by a capacity auction whose results you do not read.
Architecting solves this. Designing the routing, the fallbacks, the workload scheduling, and the contractual terms so that when the megawatts move, your business does not break. That is engineering work. That is also strategy work. Most teams treat the two as separate. They are no longer separate.
Agor AI Advisory builds the architecture, not the tool. We map your agent fleet against the physical and contractual realities of the compute markets that feed it. We design portability into your stack so that the next price step is a routing decision, not a crisis. We turn the question "which model" into the question your competitors are not yet asking: how do you own the cost of intelligence when the cost of intelligence is the cost of power?
The companies that ask this question in 2026 will compound advantage for a decade. The companies that do not will pay the megawatt margin to someone else, every quarter, until the bill becomes the business.
