The prompts you track decide what your AI-visibility data is even about — so choosing them from real customer language beats guessing from a keyword list. A workable method: group prompts into roughly 80% organic, 10% branded, 10% competitor, source the wording from real customer signals (PPC queries, support transcripts, live observation), and crucially, track the follow-up prompts where decisions actually happen, not just the opening question.
This is a how-to for building that prompt set. It assumes you already know why to measure AI visibility — if not, start with how to measure your AI visibility — and focuses on the one input that most tracking setups get wrong.
Why does prompt selection make or break AI tracking?
Because a tracker only reports on the prompts you give it — feed it marketer-invented phrasing and you get a confident answer to a question no customer asks. Seer Interactive argued in June 2026 that most teams pick prompts from intuition rather than observed behaviour, and that this is now a real gap: analysing 387 real prompts across seven studies (July 2025–June 2026), they reported an 83% drop in keyword-search-style prompts, a 270% rise in task-delegation requests ("do this for me"), and a 300% rise in personal information shared inside prompts. Treat those as one practitioner's dataset — directional, not definitive — but the implication is clear: people no longer talk to AI the way they typed into Google, so prompts written like search keywords miss how customers actually ask.
The fix is to anchor every tracked prompt in something a real person said.
What's a good split of organic, branded, and competitor prompts?
A practitioner framework shared by Promptwatch's co-founder (June 2026, single-vendor and self-reported) groups prompts into three monitors, weighted toward the category rather than the brand:
| Prompt type | Share | What it captures | Example |
|---|---|---|---|
| Organic | ~80% | Category questions, problems, use cases — no brand named | "What should I look for in a waterproof commuter jacket?" |
| Branded | ~10% | Sentiment and accuracy about your brand | "Is [your brand] good for cycling commutes?" |
| Competitor | ~10% | Where competitors are winning the answer | "Best alternatives to [competitor]?" |
The logic behind the heavy organic weighting: most AI buying journeys start with an unbranded problem, so that's where the share of voice is won or lost. Branded and competitor prompts act as a thermometer — they tell you how you're described and where rivals get named — but they're a small slice because customers rarely open with a brand name. Keep the exact wording realistic; that's what the next section is about.
Where should the prompt wording come from?
From real customer signals, in ascending order of effort and quality. Seer's "humanity stack" frames three tiers — use as many as you can:
- PPC and search-query data (start here). Your paid-search query reports are real human phrasings, available cheaply today. They're the fastest way to replace invented prompts with actual language.
- Sales and support transcripts (higher signal). Call notes, chat logs, and tickets reveal the vocabulary, objections, and constraints customers use — the questions they ask a human are the ones they'll ask an AI.
- Live customer observation (highest signal). Watch real customers use AI tools to make a decision. Seer's blunt version: spend 30 minutes observing one real customer in an AI tool before you build a tracking strategy. It surfaces behavioural patterns — and follow-up phrasing — competitors typically never see.
Mining the query fan-out — the sub-questions an engine spawns from one prompt — is a fourth source once you're tracking, and a powerful one for turning that data into content. But the three tiers above are how you seed a realistic set in the first place.
Why track follow-up prompts, not just the opening one?
Because the opening prompt is exploration and the follow-up is the decision — and most tools only measure the opener. Seer reported that 25–50% of prompts in a session are follow-ups, and framed it sharply:
The opening prompt is where people figure out what to ask. The follow-up prompts are where they decide.
In practice this means tracking the second and third turns of a realistic conversation: "okay, of those, which is best for under £150?" or "which of those ships to India?" Those refinements carry the personal context — budget, location, constraints — that decides who gets recommended. A prompt set that stops at the opening question measures discovery while missing the purchase, and it's a common reason a brand looks present yet still loses the sale. (For the related diagnosis of why you might be absent, see why you're invisible in AI search.)
A practical checklist for building your prompt set
Pulling it together — audit and rebuild against these:
- Audit the current list. When were the prompts last updated, and did any customer input shape them? If the answer is "we made them up," start over.
- Seed from real language. Pull from PPC queries first, then support and sales transcripts, then live observation.
- Apply the 80/10/10 split. Mostly organic category questions; a thin slice each of branded and competitor.
- Add follow-ups. For every opening prompt, write the realistic second and third turns where a buyer narrows down.
- Include personal context. Test how answers shift with constraints real customers give — budget, location, square footage, use case.
- Sample repeatedly across engines. AI answers are non-deterministic and differ by engine, so each prompt needs multiple samples over time, not a single check.
Do this by hand for a few prompts to learn the texture; do it at scale — hundreds of realistic prompts, with follow-ups, sampled repeatedly across every engine and turned into a trend — and you've described exactly the job Buffy Intel automates.