For the first version of the AI Visibility Index, we did the thing everyone does. We asked a model to guess. Feed an LLM the public corpus about a category, ask it which brands an AI assistant would probably name, and it returns a confident, plausible ranking. It looks like data. It reads like measurement. It is neither. It is a well-read intern's hunch, dressed in a leaderboard.
The fix was embarrassingly obvious once we said it out loud. If the question is "what does ChatGPT actually tell a buyer," then type the buyer's question into ChatGPT and read the answer. Then do it again in Gemini, Copilot, and Grok. Capture the response. Save the transcript. Cite it. We rebuilt the product (the AI Visibility Index and Audit at app.mvat.ai, built on MVAT, by Agor AI Advisory) around that one move: every claim now points at an openable transcript, not a guess. You can read the actual answer the assistant gave, at app.mvat.ai/evidence.
The first thing that happened when we stopped inferring and started measuring was that the inferred winner lost.
ClickUp, Not Monday
For B2B project-management software, the inferred index said Monday.com was number one. It is the brand with the loudest public footprint, the most ad spend, the most review-site presence. If you were guessing from the corpus, you would guess Monday too.
The real assistants disagree. Across thirteen captured answers to project-management buying questions, ClickUp was named in eleven of them. ClickUp is number one, and it is not close. Linear, which the inferred index missed entirely (too new, too developer-niche to dominate the public corpus), shows up in the real answers as a genuinely ranked option. The corpus underweighted it. The assistants do not.
That gap, Monday on paper versus ClickUp in practice, is the entire argument for the rebuild in one example. The assistant is not a mirror of the public corpus. It has its own opinion, and the only way to know that opinion is to ask it.
Where The Ad Layer Showed Up
Running shoes is where it got interesting. Nike is number one and it is unanimous, named in every single organic answer. No surprise there.
The sharp finding is not the winner. It is what a loser does about losing. New Balance is named in only one of seven organic answers. The recommendation engine mostly will not say its name. So New Balance bought ChatGPT's labeled "Sponsored" slot, on two separate queries. It cannot earn the organic recommendation, so it pays for the slot sitting right next to it. And the ad slot rotates by query: New Balance on some running-shoe questions, HOKA on others. There is an auction running inside the answer, and the brands that the engine will not recommend on merit are the ones bidding.
The pattern repeats in a market where you would never expect it: NYC personal-injury lawyers. A single firm, Gair Gair Conason, is number one across both ChatGPT and Gemini, and across personal-injury, car-accident, and medical-malpractice queries. One firm, every surface, every variation. Meanwhile ChatGPT runs a "Sponsored" slot in that category, and it is not a law firm at all. It is an auto-accident lead-generation company buying its way into the conversation the law firms are winning organically.
Three categories, one through-line. There is an organic answer (the brand the assistant names because it decided to), and there is a paid answer (the brand that bought the slot because the assistant would not name it). They are different brands. That difference is the whole story.
The Market With No Answer
Then there is the result we are proudest of, because it is a non-result.
For marketing and SEO agencies, there is no answer. ChatGPT and Gemini name almost entirely different firms, with near-zero overlap. Gemini even personalizes its picks to the user's city. There is no cross-surface consensus, which means there is no honest ranking to publish. So we published nothing.
That was a real decision, and it was contested. We had a category, we had captures, we had a leaderboard slot waiting to be filled. The easy move is to rank the firms by whatever thin signal exists and ship it. We refused. An empty leaderboard dressed up as data is worse than no leaderboard, because it lies with more confidence. For fragmented, personalized markets, the honest finding is that "who do AI assistants recommend" has no stable answer yet. Saying so, out loud, in the product, is the finding.
A measurement system that only ever produces a ranking is not measuring. It is performing. The ability to return "there is no answer here" is what separates the two.
The Council That Would Not Sign
None of this ships on my say-so. The ranking is gated by an autonomous three-agent council (an operator, a skeptic, and a strategist) that argues to consensus before anything publishes. On this dataset, the skeptic held the line: it refused to sign off until every single citation pointed at a real, openable transcript, not a synthesized summary of one. That hold sent us back to serve the actual captures at app.mvat.ai/evidence so a reader can open the receipt. Only then did the council clear it.
There is a second floor underneath that. A brand is only ranked if at least three separate real answers name it. One mention is noise. Two is a coincidence. Three is a signal. The rule exists so that no company gets stamped "last place" on the strength of a single bad capture. If the evidence is thin, the brand does not get ranked at all, in either direction. We would rather have a shorter, true list than a complete, fragile one.
What This Means If You Sell Anything
Here is the shift, stated plainly. For twenty years the question was "do I rank on Google." That question is being replaced. The new one is "does the assistant name me, and if it will not, can I buy my way into the answer next to the brand it does name."
AI assistants are becoming the place buyers go to ask "what's best." Inside that answer, an ad layer is forming in real time: ChatGPT's labeled Sponsored slots, Copilot's shopping cards, an auction that rotates by query. The brands that cannot earn the organic recommendation are already learning to buy the slot beside it. That is not a forecast. We watched it happen in three categories this week and saved the transcripts.
If you are a brand, the uncomfortable question is which side of that line you are on. Are you the name the assistant volunteers, or the name that has to pay to appear in the same breath? You cannot answer that by guessing, and you definitely cannot answer it by asking a model to guess for you. You answer it by reading the real responses, the way a buyer would actually see them.
Which is the one line worth keeping: you can only manage what you measure, and measuring AI visibility means reading the real answers, not guessing them.
The Index and the Audit live at app.mvat.ai. The product is live on the web, and our companion apps are live on the Apple App Store. If you want to know what the assistants actually say about your category, with a transcript you can open behind every claim, that is what we built.
