Web3 robots.txt for AI bots: the configuration guide for crypto sites
67% of crypto sites accidentally block AI bots. Complete guide to allowing GPTBot, ClaudeBot and PerplexityBot. Sample robots.txt, Cloudflare and Vercel-specific fixes, plus verification steps.
Why 67% of crypto sites accidentally block AI bots
Crawlux scanned 207 crypto sites in March 2026. 139 (67%) block at least one major AI search bot. The blocking is rarely intentional. The most common causes: copied robots.txt templates that disallow AI crawlers as default, CDN-level managed rulesets enabled without reviewing the bot list and framework templates (especially Vercel) that ship with restrictive robots.txt.
The cost of an accidental block is total. A site that blocks GPTBot has zero ChatGPT citation rate by definition. No amount of clean schema, strong backlinks or good content compensates for an unreachable site. The mechanism is mechanical: no crawl means no index means no citation. The companion press release covers the full scan data.
The 13 AI bots crypto sites should allow
The current canonical list of AI bots that crypto sites need to allow, organized by parent company. OpenAI: GPTBot, OAI-SearchBot, ChatGPT-User. Anthropic: ClaudeBot, anthropic-ai, Claude-Web. Perplexity: PerplexityBot, Perplexity-User. Google AI Overviews and Gemini: Google-Extended. Apple Intelligence: Applebot-Extended. Common Crawl: CCBot. ByteDance Doubao: Bytespider. Meta AI: Meta-ExternalAgent.
The list updates quarterly. Most teams overlook ChatGPT-User (used when a ChatGPT user pastes a URL and asks the model to read it) and OAI-SearchBot (the search-specific crawler distinct from the training-data crawler GPTBot). Both are needed for full ChatGPT citation coverage.
Sample Web3 robots.txt template
The recommended starting template allows the 13 AI bots while preserving site-specific deny rules. Place at the root: User-agent: GPTBot, Allow: /. User-agent: OAI-SearchBot, Allow: /. User-agent: ChatGPT-User, Allow: /. User-agent: ClaudeBot, Allow: /. User-agent: anthropic-ai, Allow: /. User-agent: Claude-Web, Allow: /. User-agent: PerplexityBot, Allow: /. User-agent: Perplexity-User, Allow: /. User-agent: Google-Extended, Allow: /. User-agent: Applebot-Extended, Allow: /. User-agent: CCBot, Allow: /. User-agent: Bytespider, Allow: /. User-agent: Meta-ExternalAgent, Allow: /. End with: User-agent: *, Disallow: /admin/, Disallow: /internal/, Sitemap: https://yourdomain.com/sitemap.xml.
The template ships as a copy-paste-ready block in the Crawlux Web3 Robots.txt Checker. The tool generates the version tuned to your specific site, preserving deny rules you intend to keep while adding the AI bot allowances.
Cloudflare-specific: what to toggle
Cloudflare's WAF managed rulesets include an "AI Scrapers and Crawlers" rule that enables by default in new accounts. The rule blocks GPTBot, ClaudeBot, PerplexityBot and several others at the edge. Even if your robots.txt allows the bots, Cloudflare blocks the request before it reaches your origin server.
The fix: open Security > WAF > Managed Rules in the Cloudflare dashboard. Find the "AI Scrapers and Crawlers" rule. Set to "Off" rather than "Block" or "Challenge". If you have specific scraper concerns, replace with targeted rules using IP ranges or user-agent strings rather than the broad managed ruleset.
Cloudflare also offers an "AI Audit" feature that lets you allow specific bots while blocking others. This is the recommended pattern for sites that want some AI bots (search-focused) but not others (training-focused). The companion press release covers the toggle in more detail.
Vercel and Next.js specific: template fixes
Vercel deployments using the default Next.js template ship with a robots.txt that disallows AI crawlers. The file lives at /public/robots.txt or is generated by /app/robots.ts depending on framework version. Replace the contents with the Web3 template above.
Additionally check the middleware. If you have middleware.ts at the project root, ensure it does not intercept bot user-agents. Some templates include bot-blocking middleware for "performance" that incidentally blocks legitimate AI crawlers. Comment out or scope the bot rules to specific paths only.
Vercel also offers an "Edge Config" feature for runtime config. If you use it for bot policy, audit the current rules and remove any AI bot blocks.
AWS CloudFront and other CDNs
CloudFront with AWS WAF often runs the "AWS Managed Rules - Common Rule Set" which can include bot-blocking patterns. Review the active rule groups and disable any rule that targets AI crawler user-agents. The specific rule names change with AWS releases; check the documentation for current naming.
Fastly, KeyCDN and Bunny CDN typically pass bot requests through by default but may have origin-shield rules that interfere. Check your edge-level access control lists for any rules that filter by user-agent string.
How to verify the bots actually got through
Updating robots.txt is necessary but not sufficient. The bots need to actually be reaching your origin server. Two verification methods. First, check server logs for the bot user-agent strings (filter by "GPTBot", "ClaudeBot", "PerplexityBot"). You should see legitimate crawl requests within 7 days of allowing the bot. Second, use the Crawlux Web3 Robots.txt Checker which attempts an actual crawl as the bot user-agent and compares the response to what robots.txt suggests.
For sites behind CDNs, the verification step is especially important because the robots.txt update may not propagate through every edge layer. Some CDN configurations cache robots.txt for hours; force a cache purge after updating to ensure bots see the new version on next request.
Monitoring for drift
Robots.txt drift is a known failure mode. A developer updates the file as part of an unrelated deploy and accidentally removes the AI bot allowances. The Crawlux Pro AI Visibility Audit monitors robots.txt daily and alerts on changes that affect bot policy. For teams that prefer manual monitoring, the free Web3 Robots.txt Checker can be run on a cron or via a GitHub Action to validate weekly.
Take
Cloudflare's "AI Scrapers and Crawlers" managed rule blocks the legitimate AI search bots at the edge. Your robots.txt does not matter if the bot never reaches it.
// Related
Crawlux is the world's first automated SEO audit tool built for Web3, DeFi and blockchain. The platform runs 23 analyzers across 6 check groups including AI visibility testing across ChatGPT, Perplexity and Claude. Free tier available. Paid tiers from $25 per audit. More at crawlux.com.
Frequently asked questions
If I block Bytespider, do I lose ByteDance traffic in China?
You lose Doubao (ByteDance's AI assistant) citation eligibility. Doubao has growing share in Chinese crypto markets. If China is not a market for you, blocking Bytespider has minimal cost. If China matters, allow it.
What about wallet-specific bots like Phantom or MetaMask?
These are not crawlers. They are user-agent strings for wallet browsers. Allow them at the same level as regular browsers, not as bots.
Should I block AI training bots while allowing search bots?
Most major AI companies use the same bot for both training and search (GPTBot, ClaudeBot). Blocking training also blocks search citation eligibility. The trade-off generally favors allowing both.
Can I rate-limit AI bots?
Yes. Most AI bots respect Crawl-delay directives. Set Crawl-delay: 5 in robots.txt to slow down crawl rate if origin load is a concern. Do not set it higher than 10 or some bots will deprioritize indexing.
RUN YOUR FIRST AUDIT FREE
See Crawlux on your own crypto site.
No signup, no credit card. Full Web3-tuned audit report in 60 seconds.
Free first audit · No signup · 60 seconds · Full PDF report
