NEWWorld's first AI visibility audit tool for Web3 is live.Run free audit →
Free tool · Catches deprecated user-agents · CDN-aware

Are you actually allowing the right AI bots? Most crypto sites are not.

Type your domain. We fetch your robots.txt, run a CDN override test and verdict 12 AI bots across training, search and user-fetch roles. Catches the gotchas: deprecated anthropic-ai blocks doing nothing, Cloudflare overriding your origin file, training-vs-search confusion.

12 AI bots testedCDN override detection~5 seconds end-to-end

Free check · No credit card · No signup · Works on any public domain

// What you get back

A representative output for a DeFi protocol

Per-bot status across training, search and user-fetch roles. Then a copy-paste robots.txt block tuned for crypto AEO.

Web3 robots.txt check · 12 AI bots tested · CDN scan complete
example-protocol.xyz
3/12

3 of 12 AI bots correctly configured · 4 critical issues

Search bots are missing from your allowlist. Two deprecated user-agent strings are doing nothing. Cloudflare is serving a different robots.txt at the CDN. Four issues to address below, roughly 10 minutes of edits.

AI bot matrix

User-agentOperatorRoleStatus
GPTBotOpenAITraining● Blocked
OAI-SearchBotOpenAISearch○ Implicit allow
ChatGPT-UserOpenAIUser-fetch○ Implicit allow
ClaudeBotAnthropicTraining● Blocked
Claude-SearchBotAnthropicSearch○ Implicit allow
Claude-UserAnthropicUser-fetch○ Implicit allow
anthropic-aiAnthropic (deprecated)Training⊘ Deprecated · noop
Claude-WebAnthropic (deprecated)Training⊘ Deprecated · noop
PerplexityBotPerplexitySearch○ Implicit allow
Perplexity-UserPerplexityUser-fetch○ Implicit allow
CCBotCommon CrawlTraining● Blocked
Google-ExtendedGoogle AITraining✓ Allowed

Top 4 issues to fix

  1. Search bots only implicitly allowed. OAI-SearchBot, Claude-SearchBot, PerplexityBot have no explicit Allow rule. They fall back to the wildcard User-agent: * Allow: /, but operators recommend explicit allowlist for clarity and to override CDN bot management defaults that block on no-rule. Fix: add explicit Allow blocks per the snippet below.
  2. Two deprecated user-agent strings (anthropic-ai, Claude-Web) doing nothing. Anthropic deprecated both. Your robots.txt has Disallow rules targeting them; the real ClaudeBot ignores those rules. Fix: remove the deprecated entries (purely cosmetic, but the file is also misleading any human auditor reading it).
  3. Cloudflare is overriding your origin robots.txt. The CDN serves a different file to bot user-agents than to humans, with broader Disallow rules. This is the Cloudflare-managed AI crawler block enabled in Security > Bots. Fix: in Cloudflare dashboard, disable "Manage your robots.txt" so your origin file takes precedence, or move the AI bot rules into Cloudflare directly.
  4. No /admin/ or /internal/ Disallow rules. Best practice for crypto sites is selective access: allow /docs/, /blog/, /pricing/ but Disallow gated and admin areas. Yours has no Disallow on common gated paths. Fix: add Disallow: /admin/, Disallow: /internal/, Disallow: /api-docs/private/.

Recommended robots.txt for crypto AEO

# ============================================================
# robots.txt for crypto AEO baseline
# Block AI training crawlers, allow AI search crawlers
# ============================================================

# --- AI search & user-fetch (allow for AEO visibility) ---
User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: Claude-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

# --- AI training bulk crawlers (block by default) ---
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

# --- Default policy + crypto-specific Disallow ---
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /internal/
Disallow: /api-docs/private/

Sitemap: https://example-protocol.xyz/sitemap.xml

Want the full AEO readiness audit, not just robots.txt?

Robots.txt is one of four AI Visibility readiness dimensions. The full audit also covers schema readiness, factual density and authority signals plus 30+ category prompts tested across all 3 LLMs. From $25 one-time per domain.

See AI Visibility module

Output above is representative. Actual checks return the real bot-by-bot status for the domain you submit.

// How it works

Three steps, ~5 seconds end-to-end

No signup. No credit card. Just a domain and a button.

01

Fetch the file

The tool fetches /robots.txt and a sample page. Both fetches happen with bot-style headers so we can detect Cloudflare-managed overrides at the CDN level.

02

Parse and verdict

Every User-agent block is parsed. Specificity rules are applied (more specific paths take precedence). Each of the 12 AI bots is tagged Allowed, Blocked, Implicit or Deprecated.

03

Get fix snippet

Per-bot verdict, top issues with reasoning, plus a copy-paste robots.txt block tuned for crypto AEO that you can ship in one commit.

// Three bot roles, three different decisions

"AI bots" is not one thing. It is three.

Each AI company runs separate user-agents for training, search and user-fetch. Blocking one has zero effect on the others.

● Training

Bulk web scraping for model training

Crawls the open web to collect training data. Heavy bandwidth. No referral traffic back. Block to protect content from being absorbed into training datasets.

GPTBotOpenAI
ClaudeBotAnthropic
CCBotCommon Crawl
Google-ExtendedGoogle AI

Common crypto stance

Block · No AI training

● Search

Indexing for AI search answers

Powers live answers inside ChatGPT, Claude and Perplexity. Drives referral traffic that converts 4.4x better than standard organic. Allow for AEO visibility.

OAI-SearchBotOpenAI
Claude-SearchBotAnthropic
PerplexityBotPerplexity

Common crypto stance

Allow · Stay visible

● User-fetch

On-demand page retrieval

Fetches a specific page when a user asks the AI a question that needs fresh data. High-intent traffic. Allow so users get accurate, current answers about your protocol.

ChatGPT-UserOpenAI
Claude-UserAnthropic
Perplexity-UserPerplexity

Common crypto stance

Allow · Stay visible

27%

of B2B sites accidentally block LLM crawlers via CDN-level rules

Mersel · ziptie

69%

of AI crawlers cannot execute JavaScript on SPA sites

Vercel · MERJ

4.4x

conversion lift on AI-referred traffic vs standard organic search

Superlines aggregated
// Web3 robots.txt FAQ

Common questions about AI bot allowlisting

If you have a question not answered here, the full AI Visibility audit module page goes deeper.

What is the difference between training crawlers and search crawlers?

Same company, different bots, different jobs. Blocking one has zero effect on the other. The most common mistake is treating them as the same thing.

Training

Bulk scrape for model training. Common stance: block.

Search

Powers AI search answers. Common stance: allow.

Are anthropic-ai and Claude-Web still valid user-agents?

No. Anthropic deprecated both. Robots.txt rules targeting only these strings do nothing. The active Anthropic user-agents are:

anthropic-aiClaude-WebClaudeBotClaude-SearchBotClaude-User
Why does my robots.txt look correct but bots are still blocked?

Cloudflare and other CDNs commonly override your origin file with their own AI bot rules. Fix is in the CDN dashboard, not the origin file.

27%
of B2B SaaS and ecommerce sites accidentally block major LLM crawlers via CDN-level rules. Mersel · ziptie research
What should a crypto site allow vs block?

Selective access is the recommended default. Block training bots, allow search and user-fetch bots, Disallow gated paths.

Allow

/docs/, /blog/, /pricing/, protocol explainers

Disallow

/admin/, /internal/, gated community areas

Does blocking GPTBot affect Google rankings?

No. Googlebot handles search; Google-Extended handles AI training. Block AI bots without affecting Google search.

Googlebot

Search rankings · independent

Google-Extended

AI training · safe to block

Why is JavaScript rendering relevant to robots.txt?

Robots.txt may correctly allow GPTBot, but the bot still sees a blank page on SPA sites. Fix: server-side or static prerendering.

69%
of AI crawlers cannot execute JavaScript. SPA-built crypto sites ship empty HTML to most AI bots. Vercel · MERJ research
What is the AI-referred traffic conversion lift?

AI-referred users arrive higher-intent. They have already read the AI's explanation of your protocol before clicking through.

4.4x
conversion lift on AI-referred traffic versus standard organic search. Superlines aggregated data
How often should I re-check robots.txt configuration?

Quarterly is enough. Re-check sooner after CDN migration, host change or major site rebuild. New AI user-agent strings appear regularly.

Quarterly cadenceAfter CDN swapAfter site rebuildIf AEO drops

Track citation rate with the AI Citation Checker to catch regressions.

Robots.txt is one piece. Want the full AEO readiness picture?

robots.txt is one signal. Crawlux is our free audit tool that scans your full domain and gives you a complete report on what AI bots actually find: schema, content depth, FAQ structure and 5 more areas. Takes about 4 minutes. No signup, no credit card.

Free tier · No credit card · One-time pricing on paid tiers