Free tool · Catches deprecated user-agents · CDN-aware

Are you actually allowing the right AI bots? Most crypto sites are not.

Q: What is the difference between training crawlers and search crawlers?

Training crawlers and search crawlers are different bots from the same company. GPTBot trains OpenAI models; OAI-SearchBot powers ChatGPT live search results. ClaudeBot trains Anthropic models; Claude-SearchBot powers Claude search results and Claude-User fetches pages on demand when a user asks Claude a question. Blocking the training bot has zero effect on the search bot. The single biggest mistake we see on crypto sites is blocking everything Anthropic or OpenAI without realizing that one decision removes the site from AI search answers entirely while a different decision (block training, allow search) preserves AI visibility while protecting content from training datasets.

Q: Are anthropic-ai and Claude-Web still valid user-agents?

No. Anthropic deprecated both anthropic-ai and Claude-Web user-agent strings. Sites with robots.txt blocks targeting only those strings are not blocking the real ClaudeBot anymore. Anthropic has clarified that the active user-agents are ClaudeBot (training), Claude-SearchBot (search) and Claude-User (user-initiated fetches). The checker explicitly flags any rule still targeting deprecated strings so you know which entries are doing nothing.

Q: Why does my robots.txt look correct but bots are still blocked?

Cloudflare and other CDNs commonly override the origin robots.txt with their own AI bot management rules. Approximately 27% of B2B SaaS and ecommerce sites are accidentally blocking major LLM crawlers via CDN-level rules without knowing it (Mersel/ziptie research). The checker fetches your robots.txt twice: once via the public URL and once with headers that match a known AI bot, then flags any difference. If the CDN serves a different file to the bot than to humans, that is the issue and the fix is in the CDN dashboard, not in your origin robots.txt.

Q: What should a crypto site allow vs block?

The recommended default for most crypto and Web3 projects is selective access: allow high-value directories like /docs/, /blog/, /pricing/ and protocol explainers; block admin panels, internal tooling and gated community areas. For training vs search, the most common stance is allowing search bots (OAI-SearchBot, Claude-SearchBot, PerplexityBot) so your protocol shows up in AI answers, while blocking training bots (GPTBot, ClaudeBot, CCBot) to protect content from being absorbed into training datasets without attribution. Some teams allow training too; the call is yours but the choice should be deliberate, not accidental.

Q: Does blocking GPTBot affect Google rankings?

No. Blocking GPTBot has no measurable impact on Google search rankings (publisher network analysis cited by Playwire). Google uses a separate crawler (Googlebot) for search and a separate token (Google-Extended) for AI training control. You can block all AI training crawlers without affecting Google search at all. Many crypto sites assume blocking AI bots will hurt Google rankings and avoid the configuration question entirely; that is the most expensive form of inaction in AEO right now.

Q: Why is JavaScript rendering relevant to robots.txt?

69% of AI crawlers cannot execute JavaScript (Vercel/MERJ research). For crypto sites built as SPAs (single-page applications) or that render content client-side, the robots.txt may correctly allow GPTBot but the bot still sees a blank page because it cannot execute the React or Vue bundle. The checker runs a quick render test and flags pages that ship empty HTML to non-JS user-agents. The fix is server-side rendering or static prerendering for the AI bot user-agent strings; the AI Visibility audit module covers the full SPA optimization checklist.

Q: What is the AI-referred traffic conversion lift?

AI-referred traffic converts 4.4x better than standard organic search (data aggregated by Superlines). For a crypto protocol, an AI-referred user has typically already vetted you through the AI answer (read the model output explaining your protocol, your security posture, your tokenomics) before clicking through. They arrive higher-intent than a generic Google searcher. Misconfigured robots.txt that excludes your site from AI answers is therefore not a 1-for-1 traffic loss; it is a disproportionate loss of high-converting traffic.

Q: How often should I re-check robots.txt configuration?

Quarterly is enough for most crypto sites. Re-check sooner if you migrate CDNs, change hosts, ship a major site rebuild or notice a sudden drop in AI citation rate (use the AI Citation Checker free tool to track that). Anthropic, OpenAI and Perplexity all introduce new user-agent strings periodically; quarterly review keeps you current. Pro tier ($25) re-audits every 90 days automatically as part of the full AI Visibility audit module.

Type your domain. We fetch your robots.txt, run a CDN override test and verdict 12 AI bots across training, search and user-fetch roles. Catches the gotchas: deprecated anthropic-ai blocks doing nothing, Cloudflare overriding your origin file, training-vs-search confusion.

12 AI bots testedCDN override detection~5 seconds end-to-end

// What you get back

A representative output for a DeFi protocol

Per-bot status across training, search and user-fetch roles. Then a copy-paste robots.txt block tuned for crypto AEO.

Web3 robots.txt check · 12 AI bots tested · CDN scan complete

example-protocol.xyz

3/12

3 of 12 AI bots correctly configured · 4 critical issues

Search bots are missing from your allowlist. Two deprecated user-agent strings are doing nothing. Cloudflare is serving a different robots.txt at the CDN. Four issues to address below, roughly 10 minutes of edits.

AI bot matrix

User-agent	Operator	Role	Status
`GPTBot`	OpenAI	Training	● Blocked
`OAI-SearchBot`	OpenAI	Search	○ Implicit allow
`ChatGPT-User`	OpenAI	User-fetch	○ Implicit allow
`ClaudeBot`	Anthropic	Training	● Blocked
`Claude-SearchBot`	Anthropic	Search	○ Implicit allow
`Claude-User`	Anthropic	User-fetch	○ Implicit allow
`anthropic-ai`	Anthropic (deprecated)	Training	⊘ Deprecated · noop
`Claude-Web`	Anthropic (deprecated)	Training	⊘ Deprecated · noop
`PerplexityBot`	Perplexity	Search	○ Implicit allow
`Perplexity-User`	Perplexity	User-fetch	○ Implicit allow
`CCBot`	Common Crawl	Training	● Blocked
`Google-Extended`	Google AI	Training	✓ Allowed

Top 4 issues to fix

Search bots only implicitly allowed. OAI-SearchBot, Claude-SearchBot, PerplexityBot have no explicit Allow rule. They fall back to the wildcard User-agent: * Allow: /, but operators recommend explicit allowlist for clarity and to override CDN bot management defaults that block on no-rule. Fix: add explicit Allow blocks per the snippet below.
Two deprecated user-agent strings (anthropic-ai, Claude-Web) doing nothing. Anthropic deprecated both. Your robots.txt has Disallow rules targeting them; the real ClaudeBot ignores those rules. Fix: remove the deprecated entries (purely cosmetic, but the file is also misleading any human auditor reading it).
Cloudflare is overriding your origin robots.txt. The CDN serves a different file to bot user-agents than to humans, with broader Disallow rules. This is the Cloudflare-managed AI crawler block enabled in Security > Bots. Fix: in Cloudflare dashboard, disable "Manage your robots.txt" so your origin file takes precedence, or move the AI bot rules into Cloudflare directly.
No /admin/ or /internal/ Disallow rules. Best practice for crypto sites is selective access: allow /docs/, /blog/, /pricing/ but Disallow gated and admin areas. Yours has no Disallow on common gated paths. Fix: add Disallow: /admin/, Disallow: /internal/, Disallow: /api-docs/private/.

Recommended robots.txt for crypto AEO

# ============================================================
# robots.txt for crypto AEO baseline
# Block AI training crawlers, allow AI search crawlers
# ============================================================

# --- AI search & user-fetch (allow for AEO visibility) ---
User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: Claude-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

# --- AI training bulk crawlers (block by default) ---
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

# --- Default policy + crypto-specific Disallow ---
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /internal/
Disallow: /api-docs/private/

Sitemap: https://example-protocol.xyz/sitemap.xml

Want the full AEO readiness audit, not just robots.txt?

Robots.txt is one of four AI Visibility readiness dimensions. The full audit also covers schema readiness, factual density and authority signals plus 30+ category prompts tested across all 3 LLMs. From $25 one-time per domain.

See AI Visibility module

Output above is representative. Actual checks return the real bot-by-bot status for the domain you submit.

// How it works

Three steps, ~5 seconds end-to-end

No signup. No credit card. Just a domain and a button.

Fetch the file

The tool fetches /robots.txt and a sample page. Both fetches happen with bot-style headers so we can detect Cloudflare-managed overrides at the CDN level.

Parse and verdict

Every User-agent block is parsed. Specificity rules are applied (more specific paths take precedence). Each of the 12 AI bots is tagged Allowed, Blocked, Implicit or Deprecated.

Get fix snippet

Per-bot verdict, top issues with reasoning, plus a copy-paste robots.txt block tuned for crypto AEO that you can ship in one commit.

// Three bot roles, three different decisions

"AI bots" is not one thing. It is three.

Each AI company runs separate user-agents for training, search and user-fetch. Blocking one has zero effect on the others.

● Training

Bulk web scraping for model training

Crawls the open web to collect training data. Heavy bandwidth. No referral traffic back. Block to protect content from being absorbed into training datasets.

GPTBotOpenAI

ClaudeBotAnthropic

CCBotCommon Crawl

Google-ExtendedGoogle AI

Common crypto stance

Block · No AI training

● Search

Indexing for AI search answers

Powers live answers inside ChatGPT, Claude and Perplexity. Drives referral traffic that converts 4.4x better than standard organic. Allow for AEO visibility.

OAI-SearchBotOpenAI

Claude-SearchBotAnthropic

PerplexityBotPerplexity

Common crypto stance

Allow · Stay visible

● User-fetch

On-demand page retrieval

Fetches a specific page when a user asks the AI a question that needs fresh data. High-intent traffic. Allow so users get accurate, current answers about your protocol.

ChatGPT-UserOpenAI

Claude-UserAnthropic

Perplexity-UserPerplexity

Common crypto stance

Allow · Stay visible

27%

of B2B sites accidentally block LLM crawlers via CDN-level rules

Mersel · ziptie

69%

of AI crawlers cannot execute JavaScript on SPA sites

Vercel · MERJ

4.4x

conversion lift on AI-referred traffic vs standard organic search

Superlines aggregated

// Web3 robots.txt FAQ

Common questions about AI bot allowlisting

If you have a question not answered here, the full AI Visibility audit module page goes deeper.

What is the difference between training crawlers and search crawlers?

Same company, different bots, different jobs. Blocking one has zero effect on the other. The most common mistake is treating them as the same thing.

Training

Bulk scrape for model training. Common stance: block.

Powers AI search answers. Common stance: allow.

Are anthropic-ai and Claude-Web still valid user-agents?

No. Anthropic deprecated both. Robots.txt rules targeting only these strings do nothing. The active Anthropic user-agents are:

anthropic-aiClaude-WebClaudeBotClaude-SearchBotClaude-User

Why does my robots.txt look correct but bots are still blocked?

Cloudflare and other CDNs commonly override your origin file with their own AI bot rules. Fix is in the CDN dashboard, not the origin file.

27%

of B2B SaaS and ecommerce sites accidentally block major LLM crawlers via CDN-level rules. Mersel · ziptie research

What should a crypto site allow vs block?

Selective access is the recommended default. Block training bots, allow search and user-fetch bots, Disallow gated paths.

Allow

/docs/, /blog/, /pricing/, protocol explainers

Disallow

/admin/, /internal/, gated community areas

Does blocking GPTBot affect Google rankings?

No. Googlebot handles search; Google-Extended handles AI training. Block AI bots without affecting Google search.

Googlebot

Search rankings · independent

Google-Extended

AI training · safe to block

Why is JavaScript rendering relevant to robots.txt?

Robots.txt may correctly allow GPTBot, but the bot still sees a blank page on SPA sites. Fix: server-side or static prerendering.

69%

of AI crawlers cannot execute JavaScript. SPA-built crypto sites ship empty HTML to most AI bots. Vercel · MERJ research

What is the AI-referred traffic conversion lift?

AI-referred users arrive higher-intent. They have already read the AI's explanation of your protocol before clicking through.

4.4x

conversion lift on AI-referred traffic versus standard organic search. Superlines aggregated data

How often should I re-check robots.txt configuration?

Quarterly is enough. Re-check sooner after CDN migration, host change or major site rebuild. New AI user-agent strings appear regularly.

Quarterly cadenceAfter CDN swapAfter site rebuildIf AEO drops

Track citation rate with the AI Citation Checker to catch regressions.

Coverage and analysis

// Recent coverage

Robots.txt is one piece. Want the full AEO readiness picture?

robots.txt is one signal. Crawlux is our free audit tool that scans your full domain and gives you a complete report on what AI bots actually find: schema, content depth, FAQ structure and 5 more areas. Takes about 4 minutes. No signup, no credit card.

See AI Visibility module Run AEO test

Free tier · No credit card · One-time pricing on paid tiers