The Day We Found Out We Didn't Exist
A few days ago we ran a full audit of our own search visibility. Rankings, indexing, structured data — the usual checklist. Everything looked healthy until we asked a different question: can AI search engines actually read this site?
The answer was no.
Cloudflare — the service that sits in front of our site, like it sits in front of roughly a fifth of the web — had been quietly turning away AI crawlers. GPTBot, the crawler behind ChatGPT. ClaudeBot. PerplexityBot. All of them, blocked at the door. Not because we chose that. Because a default setting chose it for us.
Think about what that means in practice. Someone asks ChatGPT, "what's a good tool for finding local business leads?" ChatGPT cannot see our site. It cannot cite us, recommend us, or even know we exist. Every AI-assisted buying conversation in our category was happening without us in the room.
We fixed it the same day. Then we had the obvious next thought: if this happened to us — a company that writes about search visibility — how many ordinary businesses are sitting behind the same closed door right now?
So we built the detection into our enrichment pipeline and started measuring. This post covers what we found, how the blocking happens, how to check any single website in thirty seconds, and how to find every affected business in a city at once.
Why This Is Everywhere
There are two main ways a business ends up invisible to AI, and neither involves the owner making a decision.
The first is robots.txt. Every website has (or should have) a small file at yourdomain.com/robots.txt that tells crawlers what they may and may not read. Over the past two years, blocking AI crawlers in robots.txt went from a niche publisher protest to a packaged feature. Security plugins offer it as a toggle. Some hosts apply it as a preset. Cloudflare offers a "managed robots.txt" that writes the block for you.
The second is network-level blocking. Since mid-2025, Cloudflare blocks AI crawlers by default for new domains — the request is refused before it ever reaches the website. There's no trace in robots.txt, nothing in the site's code, no dashboard warning the owner reads. The site looks perfectly normal to humans and to Google. It's simply dark to AI.
The publishers who block AI crawlers on purpose — news sites, forums, large content businesses — have reasons and made a choice. The New York Times blocks fourteen different AI bots by name. Reddit blocks fifteen. Fair enough; that's a deliberate licensing position.
But a plumber in Charlotte? A dental clinic in Chicago? A local law firm? They never made that choice. They have no licensing position. They just want customers to find them — and a growing share of customers now start their search by asking an AI.
"The site looks perfectly normal to humans and to Google. It's simply dark to AI — and the owner has no idea."
What Invisibility Actually Costs
The numbers behind AI search have stopped being a curiosity. Google's AI Overviews appear on more than half of queries in many categories. ChatGPT passed 200 million weekly active users and handles a growing share of product and service research through its built-in search. Perplexity processes hundreds of millions of queries a month.
More importantly, the kind of query moving to AI is exactly the kind local businesses live on: "best HVAC company near me that does same-day service," "find me a family dentist in Lincoln Park that takes my insurance," "which marketing agencies in Austin work with restaurants?" These used to be ten blue links. Increasingly they're a single synthesized answer with three citations.
If an AI assistant can't crawl your site, you are not one of the citations. Your competitor — whose site happens to be readable — is. The customer never sees a list you could have ranked on. There's no position eleven to claw back from. You're simply absent.
And unlike a rankings drop, nothing alerts the owner. Traffic from Google continues as before (Googlebot is a different crawler and is almost never blocked). Revenue erodes at the margin, query by query, as AI answers take share — and the business never learns why.
How to Check Any Single Website in 30 Seconds
You can audit one site by hand right now.
Open theirdomain.com/robots.txt in a browser and look for blocks like this:
User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: PerplexityBot Disallow: /
That's a full block: each named crawler is asked to stay away from the entire site. The names to scan for are GPTBot and OAI-SearchBot (OpenAI / ChatGPT), ClaudeBot (Anthropic / Claude), PerplexityBot, Google-Extended (Gemini), CCBot (Common Crawl, which feeds many models), Bytespider, Applebot-Extended, and Amazonbot.
While you're there, check one more thing: does theirdomain.com/llms.txt exist? That's the opposite signal — an emerging convention where a site publishes a plain-text summary specifically for AI systems. A site with llms.txt is leaning into AI visibility; a site blocking GPTBot is locked out of it.
Two honest caveats. First, robots.txt is a request, not a wall — but the major AI crawlers respect it, which is exactly why it matters. Second, network-level blocking (the Cloudflare default) leaves no trace in robots.txt, so a clean robots file doesn't guarantee the site is readable. If a site's robots.txt itself refuses to load for automated tools while the homepage works fine, that's often the tell.
The Accidental Visibility Divide
When we ran our detection across enriched leads, a pattern emerged that surprised us: AI visibility among small businesses is almost entirely accidental, in both directions.
On the invisible side: businesses behind bot-protection services whose robots.txt won't even load for automated readers — locked doors they didn't know they had.
On the visible side, something stranger. In one batch of plumbing companies, 17 percent had a valid llms.txt file. Plumbers, publishing cutting-edge AI-discoverability files? The explanation: Rank Math, one of the most popular WordPress SEO plugins, now generates llms.txt automatically. These businesses are AI-visible because of a plugin default, just as their neighbors are AI-invisible because of a CDN default.
Almost nobody in either group made a decision. That's the market gap. Every business on the wrong side of this divide is a prospect for whoever explains it to them first — with evidence.
Finding Every Affected Business in Your Market
Checking robots.txt by hand works for one site. It doesn't work for a territory.
This is why we built AI-crawler detection into Lyre Leads enrichment. When you search any niche and city — "dentists in Denver," "law firms in Manchester" — every result's website is checked automatically, and each business gets an AI Access verdict:
Blocked — the site's robots.txt turns away the major AI crawlers. The business is invisible to ChatGPT, Claude, and Perplexity, and almost certainly doesn't know.
Partial — some AI crawlers are blocked, others aren't. Usually a half-configured plugin or an old copy-pasted robots file. Still a conversation starter.
Open — nothing in robots.txt blocks AI crawlers.
Unknown — the site's robots.txt couldn't be read by automated tools at all. In our data this cluster is dominated by aggressive bot protection, which tends to block AI crawlers along with everything else. These are strong audit candidates too.
You also see which bots are blocked, and whether the site publishes llms.txt. Filter the whole search to AI Access: Blocked, export the list with emails and decision-maker names, and you have something rare in cold outreach: a list of businesses with a specific, demonstrable, fixable problem they don't know they have.
The Pitch That Writes Itself
If you sell SEO, web design, or marketing services, notice what this gives you. Most cold outreach opens with a generic claim ("we help businesses like yours grow"). This one opens with a verifiable fact about their business:
"When someone asks ChatGPT for a recommendation in your category, your website can't be read — it's blocked by a setting on your site. Your competitor on Main Street isn't. Want me to send over a screenshot of what I found?"
It's specific. It's checkable in thirty seconds (you can literally walk them through their own robots.txt on a call). The fix is fast, which makes it a perfect low-friction first engagement — and the natural follow-up is the rest of the AI-visibility stack: structured data, content that AI engines can cite, and an llms.txt done on purpose rather than by accident. We covered the structured-data half of that pitch in why schema markup is the key to getting found by AI search — the two services bundle naturally.
And because the verdict comes attached to the rest of the enrichment — review counts, tech stack, contact names — you can stack signals. A business with 3.4 stars, an outdated WordPress install, and an AI-blocked website isn't one prospect; it's three openers aimed at the same decision-maker. (If review-based targeting is your angle, see finding SEO clients through low review counts.)
The Window Is Open Now
Here's the part worth acting on. Awareness of this problem is near zero among small businesses, and the supply of people selling the fix is tiny. That combination never lasts. Within a year or two, "AI visibility audit" will be a standard line item on every agency's services page, hosting providers will surface warnings, and the easy version of this conversation will be gone.
Right now, you can be the first person to tell a business owner that AI assistants can't see them. We know how that conversation lands — we had it with ourselves a few days ago, and we fixed it within the hour. The businesses in your market will want to fix it too. Someone is going to be the one who shows them. The only question is whether it's you.
Find the Invisible Businesses in Your Market
Search any niche and city, and Lyre Leads checks every business's AI-crawler access automatically — alongside 40+ other data points, verified emails, and decision-maker names. Filter to "AI Access: Blocked" and your prospect list builds itself. Free plan includes 500 tokens.
Start free — no credit card required
Lyre Leads