On June 16, 2026, Salesforce announced general availability of Agentforce multi-agent orchestration inside its Agent Fabric control plane. The headline feature is called "guided determinism." It is a quiet, almost embarrassed admission. The pitch from 2024 was that smart agents would chat with one another and the work would flow. The product released last week says the opposite. It says you must write fixed handoff rules between your agents, in code, by hand, or the system will not survive contact with real customers.
That is the whole story of multi-agent systems for business operations in 2026 so far. The agents got smarter. The seams between them got worse. The companies that have shipped real work have stopped trying to make the agents agree and started writing the rules of engagement between them.
This piece is about why that flip happened, what it tells you about the cost of running a fleet of agents inside an operating business, and what an operator should architect before signing a contract with anyone selling a "fabric," a "broker," a "swarm," or a "platform."
The seam is more expensive than the brain
Anthropic's engineering team published the dollar figure on this in their own writeup of how they built the Claude research system. A single chat call uses some quantity of tokens. An agentic single-agent run uses roughly four times that. A multi-agent run, where a lead model spawns subagents and orchestrates them, uses roughly fifteen times the tokens of the chat call.
Fifteen times. That multiplier is the cost of coordination. The agents are spending the extra tokens passing each other context, re-reading state, retrying when one subagent does not understand what another produced, and verifying outputs that arrived in a shape the next step did not expect. The reasoning inside any one agent did not get fifteen times deeper. Almost the entire delta is wiring.
Anthropic was direct about the limit. In the same post the team wrote that work where every agent needs the same shared context, or where steps depend tightly on each other, is a poor fit for the multi-agent pattern today. They were describing research, which decomposes well. They were warning operators that most business work does not.
Operations is the worst possible fit for a naive multi-agent rollout. A claim approval needs the policy, the customer history, the bound on the discretionary spend, the relevant legal posture, and the audit trail. A refund touches billing, ledger, customer record, fraud signal, and tax. A vendor onboarding hits procurement, security review, finance, and IT. Every one of those tasks has heavy shared context. Hand each step to a separate agent and you are paying for the whole context to get copied and re-summarized at every step. You will also lose part of it on every trip.
The fifteen-times figure is not the ceiling. It is the floor for a research-shaped problem. For an operations-shaped problem, where the shared state is large and the steps depend on each other, the multiplier is higher and the quality is worse.
When the agents argue with each other
Cogent's spring 2026 playbook on multi-agent orchestration named the single most common production failure mode by hand. It is the infinite loop. Agent A passes a task to Agent B. Agent B has slightly different instructions and bounces the task back. Agent C, added by a well meaning architect to break ties, refuses the work and routes it back to A. The customer sits and waits. The trace eventually trips a timer and the whole thing fails closed.
Anyone who has run a real customer service or back-office function recognizes the shape of this. It is the meeting that ends with "let me get back to you" and never does. Multi-agent systems for business operations reproduce that pattern in software at machine speed. The agents will iterate the same nondecision a hundred times in two minutes. They will burn tokens the whole way. The user will get a polite stall and a vague follow-up.
This is the seam talking. The agents are fine. The handoff has no rules.
The shape of an agent loop
The diagnostic is almost always the same. The architect wrote a clean prompt for each agent. The architect did not write the rule that says when Agent A is permitted to hand to Agent C directly, when it must pass through Agent B, what the payload must look like, what counts as "ready to escalate," and what counts as "ready to close." The agents then negotiate that absence at run time, in English, for as long as the system allows. Each negotiation step burns tokens, adds latency, and copies the prior turn into a new context window. None of that work is reasoning. All of it is the cost of an undesigned boundary.
Klarna already ran this experiment for us
In February 2024, Klarna announced that an OpenAI-powered AI assistant was doing the work of 700 human customer service agents and saving the company forty million dollars a year. By the most recent public update, the figure rose to 853 agents replaced and sixty million dollars saved. The press loved it. Every operator with a customer service budget read the case study.
Then, in May 2025, Sebastian Siemiatkowski reversed course in public. He told reporters the cost-driven push had produced "lower quality" work and that Klarna would start rehiring humans. Customers, he said, would always have a person available if they wanted one. Klarna continues to use AI, with humans staffed alongside.
The press read this as a humility story. It is a coordination story. Klarna did not fail at building an agent. It built a famously capable one. It failed at the seam between the agent and the rest of the work. The cases where the agent could close the loop on its own went well. The cases where the agent had to hand the customer to a human, or escalate to a different system, or admit it did not know, are where quality collapsed. The model held up. The handoff did not.
That is the lesson that should sit on every C-level desk in the second half of 2026. The vendor will demo a clean conversation. The actual work happens at the boundary where the agent stops being able to act. If you have not designed that boundary, you have not designed the system.
Gartner is naming the same shape
Gartner has now put two predictions on the record. On May 26, 2026, the firm published a release saying that uniform governance applied to all AI agents will cause enterprise agent programs to fail. Earlier in the year, Gartner's research said sixty percent of early agentic implementations will fail, and that the single biggest factor will be integration gaps. By 2027, the same research argues, forty percent of enterprises will demote or decommission an autonomous agent because of governance gaps they only noticed after production incidents.
Read those together. The failures sit downstream of model quality. They sit at the wiring. They sit at what happens when an agent has to interact with a system that has different assumptions, a different escalation path, a different idea of what counts as "done." The agents are succeeding inside their context window. They are failing the instant the context window has to talk to another one.
This is why the most interesting product news from this June was not a new model. It was Salesforce shipping Agentforce multi-agent orchestration to general availability inside Agent Fabric. The feature, in plain language, is a place to write fixed rules for how one agent gives work to another. It is the org chart for the bots. It is the policy manual for the swarm. It is the industry admitting that the seam is the system.
What "guided determinism" really means
Salesforce's framing of Agent Script for Agent Broker is worth a careful read. The "guided" part is that agents still reason inside their step. The "determinism" part is that the handoff between steps is no longer up to the agents to negotiate. The architect writes the rule. If the complaint hits a refund threshold, escalate. If the order is on a watchlist, route to fraud. If the customer is a named account, bypass the standard queue. Those choices live outside the agents, in a control plane that sits between them.
Strip the brand vocabulary and this is what the discipline of operations has always done with people. A retail bank does not let a teller decide whether to approve a hundred-thousand-dollar wire on judgment. The handoff rule is written on the wall. A hospital does not let an intake nurse decide alone when a chest-pain patient needs a cardiologist. The triage rule is written in the protocol. The humans inside each role bring judgment. The boundary between roles is policy.
Multi-agent systems for business operations need the same shape. The agents bring reasoning inside the role. The seam is policy. If you let the agents negotiate the seam in natural language at run time, you will pay the fifteen-times token tax, you will eat the infinite loop failure mode, and you will discover the integration gap in a Slack thread at midnight.
The other tell is that Salesforce is not the only vendor making this move. MuleSoft Agent Fabric does the same thing under a different name. Microsoft Foundry exposes a control plane between agents on Azure. Amazon Bedrock has been adding orchestration primitives quietly all year. None of them are model companies. The architecture is converging because the failure mode is converging.
The architect's question now
If you run a function that is staring down an agent rollout this quarter, the question is no longer which framework. CrewAI, LangGraph, the Anthropic Agent SDK, the OpenAI Agents SDK, Google's ADK, Salesforce Agent Broker, MuleSoft Agent Fabric. The list will keep growing. The question is who owns the seams.
Concretely. Who decides when a credit hold agent is allowed to release a hold without a human in the loop? Who decides when an onboarding agent is allowed to grant a vendor access to a finance system? Who decides what an agent must log every time it touches a customer record? Who decides what a downstream agent is allowed to assume about a payload another agent handed it? Who decides what happens when two agents disagree?
If the answer to any of those is "we will let the model figure it out at run time," you do not have a system. You have a demo.
The framework you pick gives you a place to write the rules down. The rules themselves are the design work. They are also the work most companies are skipping, because the agent demos look so good that the boundary work feels like overhead. That choreography work is the product.
How an operator should budget for this
Here is a different way to think about the spend. The framework, model, and tooling line is your agent payroll. It will get cheaper as model prices fall, as smaller open models close the gap, and as the inference layer commoditizes. The architecture line, the work of writing the handoff rules between agents and between agents and the rest of your stack, will not get cheaper. That is your choreography line. It is where the durable advantage lives.
If you are an operator looking at proposals from vendors who promise "autonomous" agent fleets, ask one question. Ask them to walk you through, in plain English, the exact rule the system uses to decide when a human is brought in. Then ask where that rule is written. If the rule lives in a prompt, you are buying a research demo. If the rule lives in code outside the agents, with versioning, tests, and an audit trail, you are buying a product.
The same question applies inside your own house. If you are building, the work that compounds is not the agent prompt. It is the catalog of seams. Which agent can hand work to which. Under what condition. With what payload shape. What gets logged. What gets escalated. What gets refused. That catalog is the org chart. The org chart is the system.
This is also how you make the spend defensible to the board. The framework cost is a commodity input that the CFO will benchmark against alternatives. The choreography is a capital asset that compounds with every audited incident and every clean handoff. One is rent. The other is equity in the operation.
The shape of the next twelve months
Three things are about to happen, fast.
The first is that every enterprise vendor that has not yet shipped an orchestration layer will ship one before year end. Salesforce has Agent Fabric. Microsoft has Foundry. Amazon has Bedrock Agents. ServiceNow, Oracle, Workday, SAP, and Databricks will all stake their own claims on the control plane. The model is increasingly the input. The control plane is increasingly the product. Pricing power follows.
The second is that the operators who refuse to write the seam rules will keep blaming the model. Every infinite loop will get diagnosed as a hallucination. Every escalation that did not happen will get blamed on agent autonomy. The real cause is upstream, in the design choice to leave the handoff rule unwritten. The blame will stay with the vendor. The fix will stay missed. These programs will be the forty percent of agent deployments Gartner expects to get demoted or decommissioned.
The third is that the operators who do write the seam rules will quietly build moats no one can see in a demo. Their agent stacks will look slower than the competition's at first. They will use fewer models. The control plane code will look boring. The behavior at the boundary will be repeatable, auditable, and trusted by the auditors, the regulators, the customers, and the board. That trust is the asset.
What this means for procurement
There is a short procurement consequence here that operators should write into RFPs now. Vendors should be required to produce, for every agent in a proposed system, three artifacts. First, the list of seams that agent owns, written in plain English, with the payload shape on each side. Second, the rule that governs each seam, expressed in code or a deterministic policy language, with versioning. Third, the audit trail produced at each seam crossing, including the data that left the agent, the data that arrived at the next step, and the decision the seam policy made.
If the vendor cannot produce those three artifacts, the vendor is selling you the agents and asking you to write the system. That is a reasonable arrangement, but only if you are paying the agent price for the agents and the architecture price for the architecture. Most contracts in the market today bundle both and only deliver one.
Conclusion: architect the seams or buy someone else's
A multi-agent system for business operations is a wiring problem dressed up as a model problem. The model gets all the press. The wiring decides whether it works. The Klarna reversal, the Gartner failure prediction, the Anthropic warning about shared context, and the Salesforce Agent Fabric general availability all point at one shift. The smart move in the second half of 2026 is to spend less on which model and more on who owns the rules between them.
You can buy this layer from a vendor and inherit their org chart for your bots. You can outsource the rule writing to a systems integrator and rent their choreography. Or you can architect the seams yourself, write the rules in code you own, audit them with your own people, and keep the moat.
The buy path is the fastest path to a working pilot and the slowest path to a defensible operation. The integrator path is fine if you do not mind paying rent on your own org chart forever. The architect path is the only one that produces an asset on your balance sheet at the end. It is also the only one where the audit trail belongs to you.
A consulting partner gets you there. Agor AI Advisory designs the seams between your agents the way an industrial engineer designs the seams between your people. We pick the framework after we pick the rules, not before. We name the boundary every agent is allowed to cross and the boundary it must escalate. We write the choreography that turns a fleet of agents into an operation you can audit, scale, and own.
Schedule a strategic consultation with us today.
Sources
- How we built our multi-agent research system, Anthropic Engineering
- Salesforce Agentforce Multi-Agent Orchestration Hits GA, TechTimes, June 16, 2026
- Salesforce Advances Agent Fabric with Guided Determinism and Governance Controls, Salesforce News
- Gartner Says Applying Uniform Governance Across AI Agents Will Lead to Enterprise AI Agent Failure, Gartner Press Release, May 26, 2026
- Gartner Says 60% of AI Agent Deployments Will Fail, StackOne
- Klarna AI Customer Support Efficiency, Twig
- When AI Agents Collide: Multi-Agent Orchestration Failure Playbook for 2026, Cogent
- Multi-Agent Cost Compounding, Augment Code
