For almost every site, no — a separate "AI version" isn't worth building. The higher-leverage move is to make your one real website reachable and parseable. The 2026 evidence backs this: in an Ahrefs analysis of 137,210 domains, 97% of published llms.txt files received zero requests in May 2026, with the major AI crawlers fetching ordinary HTML directly instead.
This matters because a popular strand of advice says the opposite: strip out your layout, remove the "noise," and hand language models a simplified, machine-only view. It sounds efficient. It mostly isn't — and it can quietly work against you.
Should you build a stripped, machine-only version of your site for AI?
No, with one narrow nuance. There are two different things people mean by an "AI version," and only one is risky:
- Serving the same content in a cleaner, more machine-legible form — semantic HTML, structured data, and (for simple bots) a Markdown or text representation alongside your normal pages. This is sound, and we cover the how in how to make your website agent-readable.
- Maintaining a separate, stripped mirror — a parallel "AI-only" site or a divergent simplified page that exists apart from what humans see. This is the one to avoid: it adds cost, invites divergence, and chases a benefit the data doesn't support.
The distinction is between the same truth, more legibly and a second, different truth. The first helps. The second is the trap.
What does the llms.txt data actually show?
llms.txt is the clearest test case for the "give AI its own file" idea, and the 2026 numbers are blunt. Ahrefs analyzed domains that received traffic in May 2026 and looked at who actually fetched their llms.txt:
| Finding (Ahrefs, 137,210 domains, May 2026) | Reported figure |
|---|---|
Domains publishing a valid llms.txt |
~28% (~38,000 sites) |
| Of those, files that received zero requests | 97% |
| Files that got any requests at all | ~3% (~1,100 domains) |
| Of requests that did arrive, share from bots | ~96% |
| GPTBot share of AI requests | ~4.5% |
| OAI-SearchBot share of AI requests | ~0.7% |
Ahrefs reported that the major crawlers — GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Google-Extended — overwhelmingly skip the file and crawl HTML directly, and concluded the cons outweigh the pros for most sites (most defensible only where your audience uses coding agents). As of mid-2026, that's the state of play; it could change if an engine commits to reading the file, so date any claim you make about it. This is fresh confirmation of a verdict we'd already reached in is llms.txt worth it: cheap hygiene, not a lever.
Why does a stripped "machine-only" page lose meaning?
Because a page is more than the words on it. As SEO engineer Jono Alderson put it in a June 2026 essay, "a page is not just a container for words — it's an editorial artefact" with "hierarchy, emphasis, framing and intent baked into it." Placement, prominence, and context change how information is interpreted; strip them away and you optimise for extraction at the cost of understanding.
A separate machine-only version creates three concrete problems:
- Divergence. Once two representations of the same content exist, they drift. The human page gets updated; the AI mirror lags — and now engines (and you) have two versions of the truth to reconcile.
- Trust and cloaking risk. Serving systematically different content to bots than to people is the definition of cloaking. Even when well-intentioned, it gives an engine a reason to distrust the source rather than cite it.
- A shrinking payoff. Models are getting better at rendering and reasoning over real pages, not worse. The simplified mirror you build today is a maintenance burden you'll still be carrying when the engines no longer need it.
Optimise the page, not a shadow of it. A stripped mirror trades the signals models actually use — structure, context, corroboration — for a short-lived convenience, and leaves you maintaining two versions of the truth.
When is a machine-readable layer actually worth it?
When it's the same content, served more cleanly, at no divergence cost. Concretely:
- Exposing your core content as clean Markdown or text for simple bots, generated from the same source as your HTML so the two can't disagree.
- Sites whose customers genuinely use coding agents — the one audience Ahrefs flagged where a dedicated file is defensible.
- Adding
llms.txtas near-zero-cost hygiene, accurate and auto-generated, with no expectation that it moves citations.
The test is simple: if the machine-readable layer is a byproduct of your real content, keep it. If it's a separate thing a human has to maintain, you've built the liability, not the asset.
What should you do instead?
Spend the effort on the levers that the evidence supports — the same ones that make a page good for a screen reader make it good for an AI crawler:
- Be reachable. Server-rendered HTML, and confirm your CDN isn't blocking AI crawlers.
- Be parseable. Real semantic structure — headings, lists, tables, real
<a>/<button>— not<div>soup. See accessibility and AI-parseability. - Label your facts. Structured data tells engines what things are instead of making them guess.
- Stay fresh and authoritative. A current page from a corroborated entity gets retrieved; a stale, lone-voice page doesn't — that's the freshness cliff at work.
Make the one site you already have legible, current, and trustworthy — then measure whether engines actually surface it. Watching that visibility across every engine over time, so you can tell hygiene from a real lever, is exactly what Buffy Intel is built for.