← Back to Insights

Insight

The Compute Payroll

Ariel Agor
The Compute Payroll

Listen · Read by Leo · click any word to jump

0:00 / · loading…

On May 14, 2026, the illusion of infinite artificial intelligence broke. Axios reported that Anthropic quietly imposed strict new limits on paying customers. The company had to cap usage because software agents were burning through computing resources faster than any human ever could. This single policy change signals a complete structural break in software economics. The flat-rate monthly subscription model has failed.

One day earlier, on May 13, the Ramp AI Index revealed that Anthropic had surpassed OpenAI in business adoption, capturing 34.4 percent of the market. But the hidden detail in the Ramp data carries more weight than the headline. The fastest-growing vendors on corporate cards are no longer chat interfaces. They are raw inference platforms. Businesses are shifting their spend from software access to pure cognitive cycles.

We have reached the end of the software pricing model. You cannot buy artificial intelligence like you buy email. Agents reject the concept of a software seat. They consume raw compute cycles continuously. When you deploy an agent, you hire a digital worker that bills you by the token.

The enterprise must now manage compute as a variable labor cost. Companies that fail to grasp this shift will bankrupt themselves on inference bills. Companies that architect their workflows for compute efficiency will scale infinitely.

The Physics of Agency

Human software usage follows a predictable, lazy rhythm. A person opens a window. They type a command. They wait for a response. They read the text. They think. They go get coffee. The underlying server rests. This intermittent activity allows companies to sell software access for twenty dollars a month. They know the human bottleneck will protect their profit margins.

Agents remove the human bottleneck entirely. An agent operates in a continuous loop. It reads a database. It formats the data. It queries an application programming interface. It evaluates the response. It rewrites its own code. It tests the code. It fails. It tries again. It executes these steps thousands of times a minute.

Autonomy requires constant cognition. Constant cognition requires massive electrical power and specialized silicon.

When Anthropic launched Claude for Small Business on May 15, they included fifteen pre-built agentic workflows. These workflows cover finance and operations. A human manager toggles them on. The agent then runs in the background. It reads every incoming invoice. It reconciles every receipt. It updates the ledger.

That background operation consumes tokens constantly. Every time the agent looks at a document, the system calculates a cost. The agent burns compute with every decision. The Axios report makes the math explicit. Flat-rate pricing fails completely when machines do the prompting. The hardware constraints of the physical world demand a direct toll for every cycle of thought.

We must stop thinking about software as a static tool. Software now spends money autonomously. An agentic workflow acts as a financial engine that converts dollars into intelligence and intelligence into outcomes. You pay directly for the conversion rate.

The Seat License Fallacy

For forty years, the technology industry priced software by the human head. You bought access for one employee. You paid a flat annual fee. The software vendor absorbed the hosting costs. The buyer enjoyed predictable expenses. The chief financial officer loved the seat license because it made budgeting simple.

Early artificial intelligence tools copied this model. They sold access for a fixed monthly price. They assumed humans would interact with the models via chat. They assumed humans would get tired.

Agents operate without fatigue. They scale their activity to the exact limit of the system. If you give an agent a task, it will work until the task concludes. If the task requires reading one million lines of code, the agent will read them in seconds.

The seat license breaks under this intense pressure. Anthropic realized they could never offer unlimited access to autonomous systems. They tightened the limits. OpenAI immediately began courting those heavy users, attempting to win market share by offering more capacity. But the underlying physics remain identical for both companies. Compute costs real money. Someone has to pay for the electricity and the silicon.

The burden of that cost now shifts to the enterprise. When you turn on an agent, you agree to pay for its consumption. You abandon the safety of the fixed software budget.

This requires a mental rewrite for business operators. You must view an agent as an hourly employee who works at the speed of light. You pay this employee a variable piece rate based entirely on the complexity of its thought.

If you treat an agent like a traditional software tool, you will leave it running unchecked. The agent will execute loops. It will query databases. It will burn tokens. At the end of the month, the inference bill will arrive. Many companies will experience extreme shock when they see the cost of unmanaged autonomy.

The Illusion of Infinite Capacity

The early days of generative artificial intelligence trained the market poorly. The massive labs absorbed the cost of compute. They burned billions of venture capital dollars to acquire users. They gave the public the illusion that intelligence was essentially free.

When a consumer asked a model to write a poem, the lab lost money. When a developer pasted thousands of lines of code into a prompt, the lab lost money. The fixed twenty-dollar monthly fee never covered the actual cost of the electricity and the silicon required to generate the answers.

This subsidy created bad habits. Developers wrote sloppy prompts. They included massive, unnecessary documents in the context window. They relied on the most expensive models to perform trivial tasks. They did this because the labs hid the true cost.

The May 14 Axios report reveals the end of the subsidy. The labs can no longer afford to absorb the cost of agentic behavior. A human might ask ten questions a day. An agent might ask ten thousand questions an hour. The labs must enforce the laws of physics. They must pass the cost of compute directly down to the user.

This exposes the bad habits of the enterprise. Companies that built their early experiments on subsidized pricing will find those experiments economically unviable in the true compute market. They will discover that their workflows are grossly inefficient.

You must strip away the illusion of infinite capacity. You must operate your business under the assumption that every single token carries a hard cost. This mindset shift separates the serious operators from the tourists. The tourists will complain about the new limits and the rising inference bills. The serious operators will optimize their systems and turn efficiency into a weapon.

Software Becomes Infrastructure

Business software used to sit on top of infrastructure. You bought servers, and you ran applications on them. Then the cloud arrived, and you rented servers to run your applications.

Agents blur the line between the application and the infrastructure. An agent operates as an active system that consumes resources directly, rather than waiting passively for a user.

When Google DeepMind introduced their Magic Pointer concept on May 12, they illustrated this shift perfectly. They reimagined the traditional mouse pointer as an active, context-aware artificial intelligence partner powered by Gemini. The agent abandons the separate chat window, establishing presence directly at the operating system level. It meets the user across all tools, understanding the screen and taking action directly.

This means the agent becomes the infrastructure. It serves as the connective tissue between the human and the data. But unlike a traditional operating system, this connective tissue requires constant inference. Every time you move the pointer, the system thinks. The system burns compute.

If the agent acts as the infrastructure, then compute acts as the utility bill. You pay for the electrical grid by the megawatt drawn, avoiding flat subscriptions. You will pay for the agentic grid exactly the same way, tracking every token.

This demands a complete overhaul of vendor relationships. You must negotiate compute agreements instead of software licenses. You must secure guaranteed capacity. If your entire logistics network runs on agentic infrastructure, you cannot afford to have your agents throttled because your lab vendor ran out of server space during a demand spike.

You must build redundancy. You must maintain the ability to route your agentic workflows across multiple model providers. If Anthropic tightens their limits, you must instantly route your traffic to OpenAI. This multi-model architecture protects your business from vendor lock-in and pricing spikes.

Measuring the Machine Wage

The transition from fixed subscriptions to variable compute demands a new accounting discipline. You must measure the machine wage.

Every prompt carries a price tag. Every context window has a cost. When an agent reads a hundred-page document to extract one paragraph, it processes every word in that document. It charges you for the effort.

You must calculate the cost per outcome. If an agent resolves a customer support ticket for ten cents in compute, and a human resolves it for ten dollars in salary, the agent remains highly profitable. If a poorly designed agent loops endlessly and burns fifteen dollars in compute to resolve the same ticket, the system fails.

Efficiency becomes the primary engineering metric. In the human era, efficiency meant making software easier for people to use. In the agent era, efficiency means making tasks cheaper for models to process.

This changes how developers write code. They must filter data before passing it to the model. They must route simple tasks to small models. They must reserve large models only for complex reasoning.

The Ramp AI Index data from May 13 shows this behavior emerging in the market. Companies are moving their spend toward specialized inference platforms. They are building infrastructure to route and manage token consumption. They understand that raw intelligence acts as a commodity. The real value lies in buying that intelligence at the lowest possible price.

Your engineering team must become experts in token economics. They must optimize prompts the way factory managers optimize assembly lines. A single wasted word in a system prompt, multiplied by one million autonomous loops, equals a massive financial loss.

The Margin Compression Problem

This new economic reality creates a brutal competitive environment. Every company in your industry has access to the same frontier models. They can all rent intelligence from Anthropic or OpenAI. They all pay roughly the same public price per token.

If intelligence is a commodity, where do you find your competitive advantage?

The advantage lives entirely in orchestration. It lives in how you structure the work.

Consider two competing logistics companies. Both deploy agents to manage their supply chains. Both agents read shipping manifests and reroute trucks.

Company A buys an off-the-shelf agentic product. The vendor charges a premium on top of the raw compute cost. The vendor's agent uses a massive system prompt and sends every query to the most expensive frontier model. Company A pays one dollar for every routing decision.

Company B builds their own orchestration layer. They train a small, highly efficient model to handle routine manifest reading. This model costs fractions of a cent. They only invoke the expensive frontier model when a major storm disrupts the network. Company B pays two cents for every routing decision.

Company B operates with a massive structural advantage. They can deploy fifty times more agents than Company A for the exact same budget. They can monitor more data and react faster.

Company A will look at their inference bills and conclude that artificial intelligence costs too much. They will scale back their deployment. Company B will look at their highly optimized system and double their investment. Company B will eventually consume Company A.

You must own the architecture of your agents. You cannot outsource the cognitive supply chain. Renting pre-packaged autonomy forces you to surrender your margins to the vendor.

The Disappearance of the IT Budget

The shift to variable compute breaks the traditional corporate structure. Information technology departments control software budgets. They buy licenses and treat software as an operational expense.

Compute operates strictly as direct labor, wholly separate from traditional information technology expenses.

When an agent takes over account reconciliation, it performs the work of a junior accountant. The compute required to run that agent should come from the finance department's budget. When an agent generates marketing copy, the compute cost belongs to marketing.

This distributed consumption model forces the chief financial officer to rewrite the ledger. Compute must become a cost of goods sold. It must be tracked against specific business units and specific revenue streams.

Keeping compute hidden inside the centralized technology budget destroys your ability to measure return on investment. The technology department will see a massive, spiking monthly bill. They will panic. They will impose arbitrary limits on token consumption. They will choke the productivity of the entire company to protect a static budget line.

You must decentralize the compute budget. You must give business unit leaders the authority to spend tokens. You must also hold them accountable for the return on those tokens.

A sales director should receive a compute allowance. They can use that allowance to run agents that research prospects and draft emails. If those agents generate more revenue than they consume in compute, the sales director should increase the burn rate. The machine wage must scale directly with business value.

The End of Predictable Billing

For decades, the finance department demanded predictability. They wanted to know exactly how much the software stack would cost in the third quarter. The software industry grew massive by providing this exact comfort. You signed a multi-year contract. You paid a set fee. The risk of overconsumption fell entirely on the vendor.

Agents destroy this comfort. An agent acts as a worker with infinite stamina. Giving an agent access to your corporate database to find inefficiencies might trigger ten thousand parallel queries. It will read every transaction from the last five years. It will perform brilliant, necessary work. It will also generate a massive compute bill in three hours.

The finance department must abandon the desire for static billing. Predictable billing only exists when work is capped. Agents uncap the work.

Companies must implement dynamic financial controls. They need circuit breakers. If a specific agentic workflow burns through its daily compute allocation by noon, the system must pause. A human operator must review the output. If the agent generates massive value, the operator approves an increase in the compute budget. If the agent is caught in a hallucination loop, the operator kills the process.

This dynamic control requires real-time financial dashboards. You can no longer review software spend at the end of the quarter. You must review compute spend by the minute. The finance team must work directly with the engineering team to set hard limits on token consumption.

The Rise of the Small Model

Because frontier models cost too much for continuous loops, the market is shifting. We see the rapid adoption of small, specialized models.

A frontier model with a trillion parameters wastes money when reading a simple shipping address. You need a fast, cheap model that does one thing perfectly. The winning architecture involves a swarm of small models managed by one central routing intelligence.

The Ramp AI Index data highlights this trend perfectly. Businesses are increasing their spend on inference platforms that allow them to host and run these smaller models efficiently. They are building their own private cognitive factories.

When an agent needs to perform a task, the router evaluates the complexity. It sends ninety-five percent of the work to the cheap, local models. These models cost fractions of a penny per thousand tokens. The router only escalates the remaining five percent to the expensive lab models.

This tiered architecture is the only way to survive the compute payroll. Sending every query to the most capable model guarantees corporate bankruptcy. You must match the cost of the cognition directly to the value of the task.

The Economics of Autonomy

The true power of the compute payroll lies in its perfect elasticity. Human labor is rigid. You hire a person. You pay them a salary. If demand drops, you still pay the salary. If demand spikes, the person works overtime and makes mistakes. You cannot scale a human team overnight.

Agents scale instantly. If your web traffic doubles on a Tuesday, your agents spin up new instances. They handle the load. They consume exactly the compute required. On Wednesday, when traffic drops, the agents spin down. The cost drops to zero.

This perfect elasticity allows companies to pursue opportunities that were previously impossible.

Imagine a legal firm reviewing documents for a massive lawsuit. A human team might take three months to read one million pages. The firm must rent office space and manage contractor payroll. The cost and time make certain cases unprofitable to pursue.

An agentic system can read one million pages in an afternoon. The firm pays only for the tokens consumed during that afternoon. The cost is known immediately. The firm can take on cases of any size, knowing their labor force will scale perfectly to the exact dimensions of the problem.

This elasticity destroys traditional barriers to entry. Small companies can generate the operational output of massive enterprises. Anthropic explicitly targeted this dynamic with their May 15 release of Claude for Small Business. They understand that a five-person company with a well-architected agentic system can outperform a massive corporation that relies on human labor.

The small company bypasses massive fundraising for human hires, needing only enough capital to fund the compute burn rate. They convert capital directly into execution.

Architecting for Variable Intelligence

You cannot achieve this level of efficiency by accident. You must build for it.

Most companies are currently experimenting with artificial intelligence. They give their employees access to chat interfaces. They buy a few copilot licenses. They treat the technology as a novelty or a simple productivity boost.

These companies will fail. They are applying new physics to old machinery.

You must redesign the machinery. You must map every workflow in your organization. You must identify where human judgment is actually required and where it acts merely as a router between systems.

Once you map the workflows, you must build the orchestration layer. You must create the rules that govern how your agents consume tokens. You must set limits and build fallback mechanisms. If an agent gets stuck in a loop, the system must detect the repeated calls and cut the power before the compute bill explodes.

You must build memory structures. If an agent solves a complex problem on Monday, it should not burn expensive tokens to solve the exact same problem on Tuesday. It should retrieve the answer from a cheap, local database. Institutional memory reduces the compute burn rate.

You must build evaluation systems. You need dashboards that show exactly how much compute each workflow consumes. You need alerts when the cost per outcome deviates from the baseline. You must manage your digital workforce with the same rigor you apply to your physical supply chain.

This requires serious engineering. It requires strategic foresight. It requires a complete departure from the way businesses have historically bought and deployed software.

The shift from seats to cycles is permanent. The agents are here, and they are hungry for compute. The flat-rate software subscription cannot contain them. Your company will soon employ more digital agents than human workers. You will pay these agents in raw compute. Treating this transition as an information technology procurement issue guarantees failure. You will lose control of your margins. You will pay exorbitant costs for inefficient workflows. You will enrich the vendors while starving your own operations. You must take ownership of your token economics. You must architect your firm for the compute payroll. You need a partner who understands how to build the orchestration layers, the routing logic, and the efficiency metrics that turn variable compute into a massive structural advantage. Agor AI Advisory provides exactly this architectural expertise. We build the systems that control your machine wage and protect your margins.

Sources

  • [Anthropic tightens Claude limits as OpenAI courts agent users - Axios, May 14, 2026](https://www.axios.com/2026/05/14/anthropic-claude-price-openai-tokens)
  • [Anthropic beats OpenAI on business adoption - Ramp, May 13, 2026](https://ramp.com/leading-indicators/ai-index-may-2026)
  • [Anthropic sets sights on small business after enterprise push - Silicon Republic, May 15, 2026](https://www.siliconrepublic.com/business/anthropic-sets-sights-on-small-business-after-enterprise-push)
  • [Shaping the future of AI interaction by reimagining the mouse pointer - Google DeepMind, May 12, 2026]: <https://deepmind.google/blog/ai-pointer/>```