← Back to Insights

Insight

Seven Percent Were Ready

Ariel Agor
Seven Percent Were Ready

Listen · Read by Leo

0:00 /

On March 5, 2026, Cloudera and Harvard Business Review Analytic Services published a survey of more than 230 executives involved in AI data decisions. The headline number landed cleanly. Only 7 percent of enterprises say their data is completely ready for AI. Seventy-three percent admit their organization struggles with the data preparation step that has to happen before a single model touches a single workflow. The sample was small. The implication was not.

Six months earlier, on August 18, 2025, MIT's NANDA initiative published "The GenAI Divide: State of AI in Business 2025." Fifty-two executive interviews. One hundred and fifty-three leadership surveys. Three hundred public deployments analyzed. The finding that traveled the furthest: 95 percent of generative AI pilots inside corporations produce zero measurable P&L impact. MIT's authors named the cause and named it plainly. The models are fine. The data plumbing, the workflow integration, and the absence of a defined outcome are not.

Read together, the two reports describe the same gap from opposite ends. One measured the inputs. The other measured the outputs. The gap between them has a name that most CFOs are still trying to skip past on the way to a vendor decision. The name is data readiness for AI adoption. It is the layer the next decade of margin will be built on, and it is the layer the 2024 wave of pilots was built without.

What Data Readiness for AI Adoption Actually Means

For most of the last decade, "data quality" was a synonym for "the rows join, the dates parse, and the dashboard refreshes by 9 a.m." A model can run on that data. A model cannot trust that data. A model certainly cannot act on it without supervision.

AI-ready data is a stricter standard. It means the schema is documented and versioned so an agent can read its own permissions before it acts. It means lineage is traceable so a hallucination has a forensic trail back to the document that caused it. It means freshness is measured in minutes, not days, because models in production fail on stale context faster than humans notice. It means access boundaries live at the row level so a retrieval call into a CRM does not bypass the permission model the CRM is enforcing for human users. It means the data is governed in writing, with owners and review cadences, not in vibes.

The Modern Data Report 2026, drawing on more than 540 data leaders across 64 countries and 29 industries, captured the operational version of this gap. Finding and confirming data takes more time inside the average enterprise than actually using it. Nearly half of respondents cannot fully rely on their data for decisions. The pilots that failed in the MIT NANDA paper failed against exactly this substrate. The data was busy, present, voluminous, and untrustworthy.

Why the Pilot Failed

The standard AI pilot of 2023 through 2025 followed a pattern that almost every board has now seen in person. A budget appeared. A vendor demoed. A model was wired into a sandbox copy of a CRM extract. The pilot looked good in a slide deck. Then the production wiring began.

Production is where the data does the work. In production, the CRM extract is not a flat CSV. It is a live system with row-level permissions, deletes that ripple through dependent tables, schema migrations that fire on the third Tuesday of every quarter, and a help desk that opens 47 tickets a day asking why field X is empty for tenant Y. The model that worked in the demo was working against frozen data. The same model in production confronts living data and a permission lattice nobody documented while the demo was being prepared.

This is the gap MIT NANDA measured. Generic chat tools succeed for individuals because individuals tolerate failure and rephrase. Enterprise tools stall because enterprises have to honor commitments, contracts, and regulations. The model is rarely the bottleneck. The data context the model needs in order to act safely is the bottleneck.

The Schema Lied

In April 2026, a healthcare retrieval-augmented generation platform deployed inside a major hospital surfaced a hallucinated dosage during clinical use. The retrieval pipeline pulled a poisoned reference document from a vector store nobody had audited at the document level. The model rendered the answer with the same confidence it renders accurate ones. A nurse caught the drug interaction before harm reached a patient. Within weeks, security researchers detailed what had happened publicly. Five carefully crafted documents inside a multi-million-document vector store can steer responses in a target direction about 90 percent of the time. Five in millions. That is the upper bound of difficulty. Most production retrieval systems in active use today are wide open to far less than that.

The hospital incident is the visible case. The invisible case sits inside every retrieval system that quietly pulls from a wiki, a shared drive, or a CRM whose security model was built for humans. A naive retrieval call ignores those permissions by default. Pull anything, vectorize it, surface it. The wiki page one team can read becomes a fact the model will state to anyone who asks. The HR portal that should never reach a sales agent suddenly informs the agent's outreach. None of this is hypothetical. It is the operating reality of most enterprise RAG built between 2023 and 2025.

If your data is not AI-ready, the model is not the liar. The plumbing is.

Snowflake and Databricks Are Building the Floor

The platforms read the same numbers everyone else reads. In May 2026, Snowflake announced general availability of catalog-linked databases. The pitch is functional and unsentimental. Federate any Apache Iceberg REST catalog (AWS Glue, Databricks Unity Catalog, Microsoft OneLake) into a single environment so the lakehouse serves agents and analytics from the same governed surface. The point of the release is not the table format. The point is the governance surface that sits on top of the table format. One access policy travels with the data across engines. That is what AI-ready means at the platform layer.

On June 16, 2026, Databricks announced Lakehouse//RT, a real-time analytics layer over governed Delta Lake and Apache Iceberg data, powered by a new compute engine called Reyden. The Databricks press release leads with the use case that drove the build. Agents need millisecond serving against authoritative data, not against extract copies that drift the moment they are written. Lakebase Disaster Recovery and Vector Search shipped alongside it. The Databricks roadmap now reads like a list of every piece of data plumbing the agent era forgot to ask for in 2023.

Both companies are responding to the same set of survey findings the consultancies have been publishing all spring. The model layer is commoditizing. The data layer is where the margin lives. If your CIO is still asking which model to standardize on, the CIO is solving last year's problem with this year's budget.

The Data Readiness Bill Came Due

The Informatica CDO Insights 2026 report, published January 27, 2026 from a survey of 600 data leaders, made the budget pattern visible. Eighty-six percent plan to increase data management investment this year. The reasons, in order of weight: privacy and security at 43 percent, AI governance at 41 percent, workforce upskilling at 39 percent. The second item is the interesting one. AI governance is now a budget line of its own. Three years ago that would have been a sub-bullet under security. Two years ago it would have been a column on a risk register. Today it has its own headcount and its own quarterly review.

Grant Thornton's 2026 AI Impact Survey delivered the executional truth behind those investment numbers. Fifty-five percent of CIOs and CTOs report that fewer than half of their core applications are AI-ready. The applications named are the systems the business actually runs on. Sales tools, financial systems, supply chain platforms, customer support stacks. Half of them, or more, cannot expose their data to an agent in any reliable way. Every pilot a company runs against the AI-ready half teaches the executive team about the strengths of the model. Every pilot against the other half teaches the executive team about the limits of the application stack. Both lessons are expensive. Only one of them is useful.

The bill arrives in a recognizable rhythm. A pilot fails. The vendor is blamed. A new vendor is selected. The new vendor's pilot fails for the same reason. After three cycles, somebody on the executive team finally asks the question that should have been asked first. Is our data able to do this? The honest answer, for 93 of every 100 companies, is no. Not yet. Not without a foundation that has not been funded.

What the Seven Percent Did Differently

The Cloudera report describes the 7 percent in clinical terms. They had documented governance, integrated catalogs, lineage, and clear data ownership in place before they shipped a pilot. The Modern Data Report 2026 names the same pattern from a different angle. The Grant Thornton survey adds a financial proof point. Organizations with fully integrated AI are nearly four times more likely to report AI-driven revenue growth than those still piloting. Fifty-eight percent versus 15 percent. The variance is not random and it is not luck.

Three behaviors recur across the 7 percent.

First, they treat the catalog as the front door. The catalog is the metadata layer where ownership, freshness, and access policy are declared. When an agent needs to act, the agent reads the catalog first and the data second. The 7 percent built that pipe before they built an agent. The other 93 percent are still building the agent and asking the catalog to catch up afterward.

Second, they measure data quality where the model reads it, not where the data was born. A pipeline can be clean at the source and rotten at the retrieval point. The 7 percent instrument freshness, completeness, and drift at the retrieval layer. Their dashboards are something an agent product manager actually consults during a sprint review. Most other companies still measure data quality in a quarterly slide somebody emails to the CDO at the end of the month.

Third, they treat human workflow as the thing being automated, not the thing being augmented. The MIT NANDA finding lands here too. The pilots that succeeded had a specific human task they were displacing, with a defined success metric measured weekly. The pilots that failed had an enthusiastic VP and a vague hope. The 7 percent know which task. The 93 percent are still trapped in the demo.

The MIT paper named the success rate at 5 percent. The Cloudera paper named the readiness rate at 7 percent. The two numbers are close to each other for a reason. Readiness predicts success. The 5 percent that produced measurable returns and the 7 percent whose data is ready are largely the same population.

Architecture Is the Job

The pattern across all of this is consistent enough to name. The companies winning with AI in 2026 are doing data architecture work that looks like infrastructure work, not science work. They are picking governance models. They are building catalog layers. They are writing retrieval policies. They are designing access lattices. They are doing the unglamorous parts that nobody put in the original AI budget because the original AI budget was a vendor contract.

If you bought an AI tool, you bought a model and a UI. If you architected an AI capability, you built a data substrate that any model can be plugged into and replaced. The first is a contract. The second is a moat. The platforms that will still be paying off in 2028 are the ones whose data layer is portable, observable, and governable. The platforms that bought a model and skinned a chatbot are already learning that the model gets cheaper every quarter and the technical debt does not.

Gartner forecast through 2026 that 60 percent of AI projects without AI-ready data will be abandoned. That forecast is not a warning about model selection. It is a warning about sequence. The companies that put data readiness first are now shipping faster than the companies that put models first, because the data substrate makes every subsequent model deployment cheap. The companies that put models first are now rebuilding the data foundation underneath live production agents. That is the expensive way.

The Mistake Most Boards Are Making Right Now

Most board AI discussions in June 2026 still treat the AI question as a vendor selection. Which platform. Which model. Which integrator. The Cloudera and Harvard Business Review survey, the MIT NANDA report, the Informatica CDO Insights, the Grant Thornton numbers, and the Gartner forecast all point to the same conclusion. Vendor selection is the last decision, not the first. The first decision is architecture. The second is governance. The third is workflow ownership. The fourth is the measurement surface for data readiness itself. Then, finally, comes the model and the vendor.

A board that runs that sequence backward will run pilots for another twelve months and produce another stack of decks. A board that runs that sequence in order will spend six months on data foundation work, ship a working agent stack on top of a substrate that survives a vendor change, and never need to redo the work when the next model generation lands.

This is not a counsel of caution. It is a counsel of order. Speed comes from sequence done right. Speed does not come from skipping the unglamorous half.

Why You Have to Architect This, Not Buy It

A tool can be bought. A capability has to be built. The pattern the 7 percent followed is reproducible, but it does not arrive in a quarterly subscription package. It looks like a months-long architecture project that delivers a catalog, a governance model, a retrieval policy, a measurable data quality surface, and a workflow ownership map. It produces a substrate every future AI workload can run on without remediation. It does not look like a flashy proof of concept and it does not photograph well at an offsite.

Every vendor selling a model is incentivized to tell you the model is the answer. Every vendor selling an integration is incentivized to tell you the integration is the answer. Almost nobody in the vendor stack is incentivized to tell you the truth that the surveys keep arriving at. Your data is not ready. The model will not save you from that fact. The vendor cannot fix it for you because the architecture decisions belong to you and the data ownership belongs to you. Outsourcing the architecture is how the 93 percent got to 93 percent.

This is the part of the work where most consulting firms hand you a tool and a license. We do not.

The Imperative

If your organization is still running pilots and still calling them pilots, the problem is no longer the model. The Cloudera 7 percent, the MIT 95 percent failure rate, the Gartner 60 percent abandonment forecast, the Grant Thornton 55 percent AI-unready application rate, and the Informatica 86 percent increased investment forecast all describe the same situation from different angles. The diagnosis is unanimous. Data readiness for AI adoption is the substrate everything else depends on. Without it, the next pilot fails the same way the last one did, regardless of whose logo is on the slide.

You do not need another tool. You need the architecture that turns every future tool into compounding value rather than another line on the technical debt ledger. The 7 percent built it. The 93 percent are paying for the lesson in pilot failures, stranded vendor commitments, and another year on the wrong side of the gap. The work is real. The sequence matters. The window to do it before competitors catch up to the 7 percent is closing every quarter.

Architect the substrate. Then deploy the agents. In that order. That is the work. We do that work.

Schedule a strategic consultation with us today at our contact page.

Sources

Want this kind of automation working for your business?

Agor AI designs and ships the systems these posts describe, scoped in weeks, not quarters.

Book a Free Strategy Call