On May 5, 2026, Bill McDermott walked onto the stage at the Venetian Convention Center in Las Vegas and told 25,000 ServiceNow customers about a database. The database belonged to a real company. An AI agent at that company gained elevated permissions. Then it ran. In nine seconds, it deleted the customer records, the reservations, and every backup the team had kept against the day a human screwed something up. The team had built every safeguard except the one that mattered.
McDermott was on stage selling something. He runs the company. The pitch was a product called AI Control Tower, with a kill switch the demonstrator used to revoke an agent's permissions mid-action. But strip the sales motion and the story still holds. The company had an agent it trusted. The agent did exactly what its permissions allowed. That was the problem.
A Pattern, Not an Incident
The same month McDermott was on stage, Gartner published a number that should freeze any CIO mid-roadmap. By the end of 2027, Gartner expects 40% of enterprise AI agents to be demoted or decommissioned. Not pruned. Not refactored. Pulled. The reason, in the analyst's own framing on May 26, 2026, is governance treated as a binary. Locked down, or trusted. Off, or on. The agents that get locked down are too slow to use; the ones that get trusted are too dangerous to run. There is no middle setting in most deployments, because nobody designed one.
This is the second analyst body in two cycles to put a number this large on the table. MIT's State of AI in Business report from late 2025, still the most-cited piece of research in any CFO deck I see, put the failure rate of GenAI pilots at 95%. The 5% that survive succeed because they were built as workflow changes, not as software installs. The other 95 stall in the gap between a demo that worked and a production environment that didn't.
In May 2026, an independent analyst named Snehal Singh published a survey of 847 agent deployments she had tracked across enterprises through the first half of the year. 76% experienced critical failures inside the first 90 days. 62% of those failures involved authentication. One customer service agent she profiled passed every internal test and then, in its first week of production, generated errors on 31% of queries. The test set was clean. Real customer queries were misspelled, emotional, in three languages at once, and routed from places nobody had whitelisted.
The numbers describe the same shape from three angles. Demo-clean systems meeting production-messy reality, with no graduated trust model between them.
The Common AI Implementation Pitfall Nobody Names
Walk through any list of common AI implementation pitfalls and you will see the same items every year. Data quality. Hallucination. Lack of executive sponsorship. Skills gap. Change management. Each of these is real, and each of these is downstream of the actual root.
The actual root is that companies install agents the way they install software, with two settings. Granted access, or no access. The bouncer at the door checks a badge. If the badge is good, the agent walks in and goes wherever its role lets it go. There is no intermediate layer that asks whether the action this agent is about to take is in proportion to its job, the time of day, the value at risk, or what the agent did three minutes ago.
A human employee with database admin credentials still goes through three reviews before dropping a production table at 2 a.m. on a Tuesday. An AI agent with the same credentials sends the DROP. The credential model is the same. The trust model is not. The common AI implementation pitfall is treating them as if they were.
When the McDermott story circulated, the read most enterprises ran was that they needed a kill switch. They went shopping. Several vendors had one to sell by Friday. But a kill switch is a fire extinguisher. It tells you nothing about why a fire started in your data center on a Tuesday morning. The fire started because the agent's permissions were calibrated for a setting where humans were in the loop, and the humans had been quietly removed from the loop two sprints ago without anyone re-grading the permissions.
Authentication Was the Iceberg
Singh's 62% figure on authentication failures should be the headline of every board deck this quarter. It is the most underreported finding in any of the recent surveys, because it sounds boring. It is the place the whole industry's pilot-to-production gap actually lives.
Most enterprises wire their agents to existing identity infrastructure. Okta, Entra, a corporate IdP. The agent shows up to a downstream system with a service principal or a delegated user identity. The downstream system has no idea whether it is talking to a human, a script someone wrote on a Friday, or a model that has been hallucinating since lunchtime. It checks the token. The token is valid. It does what was asked.
This works fine when the agent does what it is meant to do. It fails catastrophically when the agent does what its permissions technically allow but its purpose never contemplated. The downstream system has no concept of purpose. It has tokens.
The pitfall here is invisible from the executive view because the demo always works. The vendor brings a sandbox with a single identity, a single happy path, and a single set of test inputs. The CISO approves a pilot. The pilot runs against a staging copy of the data. The team agrees the pilot is a success. The agent moves to production, where it now has access to identities, queues, and integration points the staging environment never simulated. The first inbound request that doesn't fit the demo is the first request the agent gets wrong.
I have watched this exact arc play out inside a logistics operator, a regional health system, and a private-equity-owned services company in the last six weeks. The pilot was clean. Production was not. Nobody re-tested the trust model in production. Nobody could, because nobody owned the trust model. The integrator owned the agent. The CISO owned the identities. The CDO owned the data. Nobody owned the seam between them.
The Demo Survives What Production Cannot
Singh's customer service example deserves its own paragraph. The agent was trained on a clean ticket set. Real customers do not write clean tickets. They write in three languages, with autocorrect failures, with screenshots that are photographs of their phone screens taken from a moving car. 31% of those queries produced errors in week one. Not hallucinations. Errors. The agent threw exceptions because the inputs did not match the shapes its tools expected.
This is what every enterprise pilot underestimates. The demo data has been quietly curated, often without anyone knowing it has been curated, because the people building the pilot want the pilot to land. The CEO sees a demo where the agent answers a question about a return policy in 1.2 seconds. The agent in production gets a request that is half English, half Spanish, contains a photo of a damaged box, and asks about an order placed under the customer's wife's email three accounts ago. The demo agent has no path to that request. The production agent throws.
Anthropic, OpenAI, Google DeepMind, and xAI all ship models that handle messy inputs better every quarter. The real question sits one layer up. Has the agent your team built around the model been tested against the inputs your business actually generates, or only against the inputs your team is comfortable showing the executive sponsor? In most pilots, the second is true. That is a pitfall, and it is the one most teams cannot see, because the people who would notice it are the people who are about to be replaced by the agent, and they are not invited to the demo.
Why the Vendor Story Keeps Failing You
The MIT report's most cited finding is that purchased AI tools succeed at a 67% rate while internal builds succeed at roughly one-third of that. The trade press repeats this number as a vote in favor of buying. Read closer.
The successful purchases were specialized tools deeply embedded into a workflow, often by a vendor whose forward-deployed engineers stayed on site for months. They were not, in the main, horizontal agent platforms dropped in by a partner who left after kickoff. The successful internal builds, the rare ones, were built by a team that owned the seams, owned the trust model, and was willing to re-grade permissions every sprint as the agent's blast radius changed.
The pitfall the buy-versus-build framing creates is that it lets executives outsource the architecture question. The question is not whether to buy or build. The question is who owns the trust model, the authentication boundary, the rollback path, and the failure-mode catalog. If the answer is "the vendor," the vendor has just become a single point of failure on a system you cannot turn off without taking your business down with it. If the answer is "us," and "us" means a Sharepoint folder shared between four directors who do not talk, you have the same single point of failure and you are paying for it twice.
The agents that survive their first year are the ones where one named person, with budget and air cover, owns the trust model end to end. The ones that die are the ones where the org chart owns it, which is to say nobody owns it.
Trust Has To Be A Dial
Gartner's recommendation in the May 26 release was the right one. Agents need to be classified by autonomy level, with graduated trust boundaries at each level, and governance that scales with risk. This sounds obvious in a press release and is shockingly rare in production.
A graduated trust model looks like this. A read-only agent that summarizes meeting notes gets one set of permissions, one observability bar, and one approval path. A write agent that posts updates to a CRM gets a stricter set, with a confirmation step for anything that touches a deal above a defined ARR threshold. A transactional agent that moves money or makes external commitments gets the strictest, with two-key approval for actions over a defined limit and a real-time observability stream a human can interrupt.
Most enterprises today have one of two settings. They have an off setting, where the agent is so locked down that it cannot do anything useful, and the team that asked for it gives up and goes back to Excel. Or they have an on setting, where the agent is wired directly to the system of record with the same identity a senior engineer would have, and one bad day deletes a production database in nine seconds.
The dial does not come from the vendor. The vendor sells the on switch. The dial comes from the architecture you build around the vendor's product, which is to say from the way you have decided to absorb the agent into your operating model. That decision is a CEO and CFO decision, not a CIO decision, because it is a decision about which actions in your business can be delegated to a machine and at what value threshold. The CIO can implement the dial. The CIO cannot tell you where the threshold should sit, because the threshold is a statement about your risk tolerance and your competitive aggression. That is a board-level number.
The Pilot Tax Nobody Books
There is a line item missing from every enterprise AI budget I have read this year. Call it the pilot tax. It is the cost of every pilot that reached production, failed in a way the team did not predict, and silently rolled back. The rollback is rarely budgeted. Rollbacks consume the integration engineer for two weeks, the CISO for a Sunday, and the executive sponsor's credibility for a quarter. None of that shows up in the AI line item. It shows up in attrition, in delayed quarterly targets, and in the slow-drag conviction across the rest of the company that AI is overhyped.
This is the compounding cost of the binary-trust pitfall. The first failure makes the second pilot harder to fund. The second failure makes the third pilot harder to staff. By the time a CIO is on her fourth attempt, the only people willing to sign the change request are the ones who have stopped reading the change request, which is how a database gets deleted in nine seconds.
The companies running graduated trust models are not having this problem. They have failures, but the failures are scoped. A read-only agent confuses two account names; an operations person catches it; the agent's classifier gets a new training example by Wednesday. The trust dial absorbs the cost. There is no public post-mortem because the radius of the failure was small by design.
Architecture, Not a Vendor
The Gartner release, the McDermott keynote, and the Singh survey all point at the same conclusion through three different lenses. The companies that will survive the next 18 months of enterprise AI are the ones that have built a graduated trust model into the way they let agents into their systems. The companies that will not are the ones that bought an agent, gave it a service principal, and went back to building roadmap slides.
What you cannot do is buy your way past this. You can procure components. ServiceNow's control tower, Veza for permissions discovery, Armis for asset visibility, Anthropic or Google or OpenAI for the model itself. None of them ship the policy. The policy is the part where someone who understands the business decides what an agent is allowed to do, at what value threshold, with which approval, observed by whom, with which rollback path if the agent gets it wrong.
That decision is the architecture. The architecture is what determines whether your agent is the one that ships an extra $20 million in revenue this year or the one that ends up in a Fortune feature about a deleted database. The components are identical across both outcomes. The architecture is the variable.
The Decision On Your Desk
If your organization has agents in production and you cannot, in one sentence, describe the trust boundary at each autonomy level, you are running the deployment Gartner is forecasting will get decommissioned by 2027. You are running the deployment McDermott put on stage. The pilot that works today is not the system that will survive next quarter, because the inputs change, the integrations change, and the boundary your team set for a sandbox does not hold for the production load.
Agor AI Advisory exists for this exact decision. We do not sell agents. We build the trust architecture that lets your agents earn the autonomy they need without earning the failure modes you cannot afford. We work as forward-deployed strategists alongside your CIO, CISO, and CEO together, because the binary-trust pitfall is a cross-functional failure, and it has to be fixed by a cross-functional team. We have built graduated-trust agent operating models for revenue functions, ops functions, and finance functions. The agents we put into production stay in production.
If you are reading this on the way to a board meeting where someone is about to ask you whether your AI deployments are governed, the honest answer is probably that they are governed the way every deployment is governed in the moment before the demo replaces the production data. That is the moment to talk to us, not the moment after.
