Measuring AI Visibility: Why One Check Is Not Enough

Large Language Models like ChatGPT, Perplexity, and Gemini are becoming a new starting point for how people discover products, companies, and information online.

For marketers, this creates a new problem: how do you know whether your brand is visible in AI-generated answers?

At first, the obvious approach is to treat AI search like SEO. You enter a prompt, check whether your brand appears, and record the result. But this does not really work.

Traditional rank tracking is built around relatively stable search results. AI search behaves differently. A single check can show you one version of the answer, but not the full picture of your brand's actual visibility.

According to a 2026 study from the University of St. Gallen, AI visibility should be measured probabilistically, not deterministically. In simple terms: you should not ask, 'Did my brand appear in this one answer?' You should ask, 'How often does my brand appear across repeated checks?'

Why AI Search Is Different

In traditional search, rankings can move, but they are usually not completely random. If your page ranks third today, there is a good chance it will be somewhere near that position tomorrow.

AI search is different because LLMs generate answers probabilistically. Even when you use the exact same prompt, the answer can change from one run to another.

This means your brand can appear in one answer and disappear in the next one, even if nothing meaningful has changed on the web.

The study describes this as an inclusion-exclusion problem. In classic SEO, a brand might move from position three to position five. In AI search, the change can be much more binary: your brand is either mentioned or not mentioned at all.

That makes one-time checks unreliable. If your brand appears once, it may be a lucky inclusion. If it does not appear, it may be an unlucky exclusion. Neither result is enough to understand your real AI visibility.

What Happens When You Re-run the Same Prompt

To test this, the researchers ran the same prompts multiple times in succession.

The results showed that AI answers can vary significantly, even when the prompt is identical and the checks happen close together.

For cited sources, the overlap between repeated runs was only 32-43%. Brand mentions were somewhat more stable, but still far from consistent. In day-to-day comparisons, brand overlap averaged only 45-59%.

This is important because it shows that volatility is not only caused by external changes, such as new pages being indexed or websites being updated. A large part of the variation comes from the AI systems themselves.

So if you check your brand once and do not see it, that does not automatically mean your brand has no visibility. It may simply mean it was not selected in that specific answer. The same works in reverse: seeing your brand once only means you appeared in one generated response.

AI Citations Are Highly Concentrated

The study also found that AI search has a strong winner-takes-most pattern.

A relatively small number of authoritative domains receive most of the citations in AI-generated answers. The researchers measured this concentration using the Gini coefficient and found an average score of 0.715 across platforms.

Google AI Mode had the highest concentration, with a score of 0.782. Perplexity was more distributed, with a score of 0.671.

For brands and publishers, this matters because AI visibility is not evenly spread. A few sources often dominate the answer space, while many others are mentioned rarely or not at all.

What Proper AI Visibility Monitoring Requires

Because AI answers are unstable, measuring visibility properly requires repeated checks over time.

A single prompt run is too noisy to be useful for professional analytics. The researchers estimate that reducing measurement error to a comfortable level requires running each prompt around 7 to 8 times per day. In practice, many teams may start with daily checks and add more runs as budgets and monitoring needs grow.

Short-term data can also be misleading. AI answers fluctuate from day to day, and model updates can affect visibility. The study suggests that stable per-brand visibility estimates require a rolling observation window of about 21 to 24 days.

Visibility also depends heavily on how the question is asked. A brand may appear for one prompt but not for another similar one. Reliable monitoring should include a portfolio of prompts covering different user intents, phrasings, and stages of the customer journey.

Conclusion

AI visibility is not a fixed ranking. It is a probability.

That is the main difference between traditional SEO tracking and AI search monitoring. In SEO, it often makes sense to ask, 'Where do we rank?' In AI search, the better question is, 'How likely are we to be mentioned?'

A single check cannot answer that. It can only show one possible version of the result.

To understand real AI visibility, marketers need repeated measurements, multiple prompts, and enough historical data to separate actual trends from random variation. Without that, decisions are based less on analytics and more on chance.

Source: Schulte, J., Bleeker, M., & Kaufmann, P. (2026). Don't Measure Once: Measuring Visibility in AI Search (GEO). University of St. Gallen.

Measuring AI Visibility Why one check is not enough.

Why AI Search Is Different

What Happens When You Re-run the Same Prompt

AI Citations Are Highly Concentrated

What Proper AI Visibility Monitoring Requires

Conclusion