← Back to Insights

Insight

Execution is Compute

Ariel Agor
Execution is Compute

Listen · Read by Leo · click any word to jump

0:00 / · loading…

On May 7, 2026, Google DeepMind published a paper detailing the AI Co-Mathematician. The researchers connected Gemini 3.1 Pro to a stateful workspace and pointed it at the FrontierMath Tier 4 benchmark. These problems exist to stall human mathematicians for decades. The system solved forty-eight percent of them.

The benchmark score commands attention. The underlying architecture demands a total revision of business strategy.

Marc Lackenby is a mathematician. He wanted to solve Problem 21.10 from the Kourovka Notebook. The problem asks a highly specific question about finite groups. Lackenby typed the problem into the AlphaEvolve interface. He waited.

The machine generated a mathematical proof. The machine then deployed an internal reviewer agent to evaluate its own work. The reviewer found a flaw. The machine presented this flawed, flagged proof to Lackenby.

Lackenby read the output. He recognized the specific gap. He typed a correction. He told the machine exactly how to bridge the logical failure. The machine incorporated his instruction and wrote the final proof. Lackenby generalized the result. The machine reviewed the new version and found two minor errors. Lackenby fixed them. The problem was solved.

Look closely at the division of labor. Lackenby provided the initial goal. He evaluated the machine's failure. He provided the specific correction. The machine did the actual work of generating the formal mathematics. The machine executed the task.

For the last century, businesses hired humans to execute. You have a goal. You hire a team to make it real. The distance between the idea and the reality is measured in human hours. Your payroll reflects the physical cost of translation.

That model died in May 2026. Execution is now compute.

The Geometry of Labor

Execution is the act of translation. A chief executive says the company must increase revenue. A vice president translates that goal into a marketing strategy. A director translates the strategy into a campaign. A manager translates the campaign into ad copy. A specialist translates the copy into digital actions. At every single step, a human mind breaks a large abstract concept into smaller concrete actions.

This translation process mirrors the exact native function of modern language models. They take a high-level prompt and generate low-level tokens. They translate a broad desire into specific, executable reality.

DeepMind formalized this reality in a companion paper released the exact same day. The researchers studied how expert mathematicians interacted with the AlphaEvolve system. They identified two distinct human activities that retain extreme economic value. They named them intentmaking and sensemaking.

Intentmaking is the act of defining the exact state you want the universe to reach.

Sensemaking is the act of looking at the universe and understanding what state it is currently in.

The machine handles everything between those two poles.

This geometry fundamentally breaks the modern corporate structure. Your organizational chart is entirely composed of people who operate in the space between intent and sense. You pay people to write the code. You pay people to draft the contracts. You pay people to build the financial models. You pay people to execute.

When execution becomes a compute function, the middle of your organization becomes obsolete. The machine can generate a ten-thousand-word architectural document in seconds. Epoch AI designed the FrontierMath Tier 4 problems to require days of focused human reasoning. Gemini 3.1 Pro alone scores nineteen percent. The AI Co-Mathematician architecture scores eighty-seven percent on internal research benchmarks. The architecture creates the gap. The system gave the machine a stateful workspace and removed the hard token limits. It gave the machine time to execute.

Human workers take days to complete tasks because they sleep and get distracted. Machines take seconds. When we allow a machine to work for an hour, generating millions of tokens, exploring dead ends, and backtracking, it achieves profound breakthroughs.

If your company pays humans to execute, you are burning capital on a commodity.

Intent and Sense

Corporate leaders assume intentmaking is easy. They assume the hard part is doing the work. This is a severe miscalculation.

Most humans never learn how to generate precise intent. Our educational systems teach execution. Our entry-level corporate jobs demand execution. We train professionals to follow instructions and complete tasks. We reward people for doing the work.

When you remove the friction of execution, you expose a terrifying void. Most professionals do not know what to ask for. They rely on the slow pace of human labor to hide their vague thinking. When execution happens instantly, poor intent becomes immediately visible.

Take a corporate acquisition. Intentmaking is defining the exact target profile. We want a logistics company in the Midwest with a specific margin profile and zero union liabilities. The machine takes this intent. The machine scrapes public filings. The machine analyzes private data room documents. The machine runs the financial models. The machine flags the regulatory risks. The machine executes the due diligence.

If you tell a machine to write a commercial lease, and your intent is vague, the machine will generate a vague lease. It will execute perfectly on your poor instructions. The human operator must learn to define strict boundary conditions. The human must specify risk tolerance. The human must articulate the unstated, invisible assumptions of the business. Intentmaking requires extreme precision.

Sensemaking requires extreme technical mastery. Once the machine executes, it returns a massive volume of output. A human must evaluate that output. The human must read the code, read the contract, and review the mathematical proof. The human must look for the subtle hallucination. The human must find the logical gap.

In the acquisition example, the machine returns a perfect summary of the data room. The human reads the summary and realizes the target company's primary supplier is exposed to geopolitical risk in South America. The machine read the contracts perfectly. The machine executed the math. The machine lacks the human intuition of global trade tensions. The human makes sense of the perfect data.

Lackenby performed pure sensemaking. The machine handed him a flawed proof. He possessed the deep mathematical knowledge required to spot the flaw. He made sense of the machine's failure.

Many executives plan to fire their senior experts and replace them with agentic systems. This is corporate suicide. The machines will generate massive amounts of execution. The output will contain subtle, catastrophic errors. Without senior experts to evaluate the output, those errors will deploy directly into your production environment. You need the experts. You simply need them doing different things. You need them verifying reality.

The Translation Problem

Execution is moving outward. It is leaving the text box and entering the physical infrastructure.

On May 12, 2026, OpenAI launched Daybreak. The initiative uses GPT-5.5-Cyber to secure software environments. The participant list includes massive infrastructure players like Cisco and Cloudflare.

The old security model relied entirely on human execution. Human analysts read server logs. Human engineers wrote patches. The attack surface grew too large. The human workforce failed to execute fast enough to stop the breaches.

Daybreak changes the verb. The machine finds the vulnerability. The machine writes the patch. The machine executes the fix. The human security team shifts entirely to intent and sensemaking. The humans define the threat models. The humans evaluate the machine's proposed patches before deployment. They orchestrate the compute.

Look at how Anthropic describes Claude Opus 4.7, released in April 2026. It is a hybrid reasoning model. It has notable improvements in advanced software engineering. It can process voice dictation, images, and text.

When a human software engineer builds an application, they translate a business requirement into a database schema. They translate the schema into backend logic. They translate the logic into frontend components. Opus 4.7 does this across modalities. You draw a diagram on a whiteboard. You take a picture. You dictate the business rules verbally. The machine translates the image and the audio directly into a functional software application. The execution layer vanishes. The translation is instantaneous.

This pattern applies to every single department. In finance, the machine reconciles the ledger and flags the anomalies. The human decides whether to investigate the anomaly. In supply chain, the machine reroutes the cargo ships based on weather data. The human defines the acceptable cost parameters for the delay.

The machine translates the data into action. The human verifies the action matches the goal.

The Collapse of the Interface

The chat box is the absolute enemy of execution.

For three years, we trained employees to talk to machines through a chat interface. You type a prompt. The machine types an answer. You type another prompt.

This is a synchronous, human-gated workflow. It forces the machine to move at the exact speed of human typing. It caps the intelligence of the model to the patience of the user.

The DeepMind AI Co-Mathematician did not use a chat box. It used an asynchronous workspace. The human defined the intent and walked away. The machine ran for hours. It spawned sub-agents. It reviewed its own work. It compiled the final result.

The enterprise of the future will abandon chat interfaces. Employees will stop talking to the AI. The AI must live in the background. It must integrate directly into the data streams. It will watch the incoming emails, monitor the supply chain feeds, and execute the responses automatically. The human will only interact with the machine through dashboards that display the machine's actions for sensemaking approval.

The chat interface served as a temporary bridge. It taught us how the models behaved. Now, the models must be decoupled from human conversation and allowed to execute asynchronously. We must let the machines run.

The Fragility of State

Outsourcing execution introduces a new physical vulnerability. When you rely on human employees, their memory provides continuity. They remember the context of the project from yesterday. They hold the state of the task in their minds.

Machines have no inherent memory. They require a state graph.

On May 14, 2026, Anthropic published a postmortem for Claude Code. Users had spent six weeks complaining about severe quality drops. Developers believed the underlying model had degraded.

The postmortem revealed the truth. The model weights had remained perfectly static. The problem lived entirely in the product layer. On March 26, Anthropic shipped a caching optimization. They wanted to clear older thinking from idle sessions to reduce latency and save compute. A bug caused this clearing function to fire on every single turn.

Claude kept working. It kept generating code. It completely lost the context of its own reasoning. It forgot why it had chosen a specific architectural approach. It became forgetful and repetitive.

Boris Cherny from the Claude Code team explained the caching bug publicly. In an extreme case, a user with nine hundred thousand tokens in their context window who idled for an hour would face a full cache miss on the next message. This consumed a massive percentage of rate limits. The fix Anthropic attempted is exactly what introduced the amnesia bug.

This event exposes the danger of renting your execution layer without controlling the architecture. Holding state is highly expensive. Vectors cost money to store. Context windows cost money to process. Vendors will always try to compress, cache, and prune your state to protect their margins. If your business depends on that state, you are entirely misaligned with your vendor.

If you rely on a vendor's interface or API wrapper to hold your project's context, you own nothing. The vendor will clear the cache. Your synthetic workers will develop sudden amnesia.

You must build your own state machines. You must hold the memory of the task on your own infrastructure. You need an independent memory layer. You need a vector database you control. You pass the context to the model, ask for the execution, and pull the result back into your own system. The model is a stateless engine. Your architecture must provide the continuity.

The Physics of Action

Execution requires energy. It requires physical resources. When humans execute, you pay for their food and shelter via a salary. When machines execute, you pay for electricity and silicon via token costs.

On May 13, 2026, Anthropic altered the rules for Claude Pro subscriptions. Users had been deploying third-party autonomous agents like OpenClaw. These agents loop. They think. They act. They execute autonomously without human prompting.

A human user might pay twenty dollars a month for a standard subscription. That user's agent would then burn hundreds of dollars of compute running continuous execution loops. Anthropic stepped in. They reinstated agent usage but metered it aggressively. The effective usage limit for external tools dropped by a factor of twenty-five. The developer community revolted.

This conflict reveals the raw physics of the new economy. Execution is compute. Compute is a physical commodity with a hard marginal cost.

Software companies spent twenty years training us to buy subscriptions. You pay fifty dollars a month, and the employee can use the software continuously. The software was passive. It sat on a server waiting for a human to click a button. The compute cost of a database query is negligible.

Agentic AI operates actively. It initiates actions. It reads the database, decides an account is at risk, drafts an email, reads the response, and updates the record. A single agent might execute ten thousand loops in a day. That requires massive GPU inference.

Software purchasing has ended. You now buy synthetic labor. Anthropic's reaction to OpenClaw is the first warning sign. You will soon see every major AI vendor shift from flat subscription pricing to strict token metering for autonomous actions.

You must manage token burn the exact same way you manage payroll. You must hire financial analysts who understand inference economics. You must calculate the return on investment for every agentic loop. You must decide if a specific autonomous action is worth the physical electricity required to generate it. You must model the token cost of a sales campaign the way you model the physical cost of shipping freight.

Architecting the Barbell

How does a business operate in this reality? You must strip out the execution management layers. Middle management exists to supervise human execution. If execution is compute, middle management serves no purpose.

You need a barbell organization.

On one end of the barbell, you place a small group of high-level intent-makers. These are the strategists, the founders, and the visionaries. They define the exact state the company must reach. They write the prompts that command the infrastructure. They define the strict boundary conditions.

In the middle, you deploy a massive array of agentic compute. You run thousands of concurrent models. They write the code, draft the documents, and execute the logistics. They run continuously. They translate the intent into reality.

On the other end of the barbell, you place a small group of highly specialized sense-makers. These are your senior experts. They stop writing code. They stop drafting contracts. They spend all their time evaluating the output of the machine. They catch the hallucinations. They bridge the logical gaps. They verify that the machine's execution aligns with the original intent.

Every process in your company must be evaluated against this framework. Is this task intent, execution, or sensemaking? If it is execution, it must be handed to the machine. If it is intent or sensemaking, it must be given to your most capable humans.

The companies that realize execution is compute will scale infinitely. They will trade human payroll for token costs. They will generate massive output with tiny, elite teams. The companies that keep paying human salaries for commodity execution will bleed capital until they disappear. You cannot compete against an organization that buys labor at the cost of electricity. You must rebuild your corporate structure around intent and sensemaking. You must build the state machines to hold your memory. You must control your own architecture.

Sources

  • [AI Co-Mathematician: Accelerating Mathematicians with Agentic AI - arXiv, May 7, 2026](https://arxiv.org/html/2605.06651v1)
  • [Intentmaking and Sensemaking: Human Interaction with AI-Guided Mathematical Discovery, May 7, 2026](https://arxiv.org/html/2605.05921v1)
  • [OpenAI Launches 'Daybreak' to Help Build Secure By Design Software, May 12, 2026](https://www.infosecurity-magazine.com/news/openai-daybreak-secure-by-design/)
  • [Anthropic Traces Six Weeks of Claude Code Quality Complaints to Three Overlapping Product Changes - InfoQ, May 14, 2026](https://www.infoq.com/news/2026/05/anthropic-claude-code-postmortem/)
  • [Anthropic reinstates OpenClaw and third-party agent usage on Claude subscriptions, with a catch | VentureBeat, May 13, 2026](https://venturebeat.com/technology/anthropic-reinstates-openclaw-and-third-party-agent-usage-on-claude-subscriptions-with-a-catch)