Tutorial 03 · 10 minutes · robots.txt

Allow AI crawlers in robots.txt.

If ChatGPT, Perplexity and Claude cannot fetch your pages, they cannot cite you. Most crypto sites block AI crawlers by accident, usually inherited from a template-level Disallow rule. This tutorial walks you to a robots.txt that explicitly allows the major AI crawlers while keeping the rest of your existing rules intact.

// Why this matters

67% of crypto sites block at least one major AI crawler.

Across 200 crypto sites Crawlux scanned in early 2026, 67% had at least one major AI crawler blocked in robots.txt. Most blocks are accidental: a wildcard Disallow rule inherited from a generic CMS template. The cost is real. A site with GPTBot blocked has zero chance of being cited by ChatGPT regardless of how good its schema is. The fix is one robots.txt edit. The lift is consistent: median 16-point AI Visibility sub-score improvement within 9 days post-deploy.

// Step 1 of 4

Check what your robots.txt currently does.

Open Web3 Robots.txt Checker. Enter your domain. The checker fetches yourdomain.com/robots.txt and reports the access status for 14 distinct AI crawler user agents. The output is a per-crawler matrix:

checker-output.txt

GPTBot         BLOCKED   (matched by User-agent: * with Disallow: /)
ClaudeBot      BLOCKED   (matched by User-agent: * with Disallow: /)
PerplexityBot  BLOCKED   (matched by User-agent: * with Disallow: /)
CCBot          BLOCKED   (matched by User-agent: * with Disallow: /)
Bytespider     BLOCKED   (matched by User-agent: * with Disallow: /)
Google-Extended BLOCKED  (matched by User-agent: * with Disallow: /)

Note the cause column. The most common pattern is a single wildcard Disallow that blocks every bot including AI crawlers as a side effect. The fix is to add AI-specific Allow rules above the wildcard, since robots.txt resolves the most-specific match first.

// Step 2 of 4

Edit robots.txt.

Add the following block to the top of your robots.txt, before any wildcard rules:

robots.txt

# AI crawler access — explicit allow for citation eligibility
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Claude-Web
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: CCBot
Allow: /

User-agent: Bytespider
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Applebot-Extended
Allow: /

# Your existing rules continue below
User-agent: *
Disallow: /admin/
Disallow: /api/internal/
Sitemap: https://yourdomain.com/sitemap.xml

A few things to notice. Each AI crawler gets its own User-agent block with an explicit Allow. The wildcard block keeps your existing rules for traditional crawlers and admin paths. The Sitemap directive at the bottom helps every crawler (including AI ones) discover your full URL set.

// Why Allow not just removal of Disallow

Some crawlers (specifically ChatGPT's GPTBot) prefer to see an explicit Allow rather than absence of a Disallow. Best practice for 2026 is explicit per-crawler Allow rules even where redundant with implicit access.

// Step 3 of 4

Deploy and re-check.

Deploy the robots.txt change to production. Most sites cache robots.txt for 24 hours so the change takes effect within that window. Re-run the Web3 Robots.txt Checker. All 14 AI crawler user agents should now show ALLOWED.

// Step 4 of 4

Wait for re-crawl, then re-audit.

AI crawlers re-fetch robots.txt on their own cadence, typically within 7 to 14 days for major sites. Lift in AI Visibility score from this fix shows up at 9-day median across the Crawlux audit corpus. Re-run a full Crawlux audit at 14 days post-deploy to measure the impact. Expected pattern:

AI Visibility sub-score: +12 to +18 points (median +16)
D04 analyzer (AI bot policy): from flag to pass
D01 / D02 / D03 cite rates: roughly double from pre-fix baseline
Overall score: +5 to +8 points (weighted contribution)

// What about Cloudflare AI bot blocking?

If you use Cloudflare's "Block AI Bots" feature.

Cloudflare ships an "AI bots" managed rule that blocks AI crawlers at the WAF layer regardless of what robots.txt says. If your site is behind Cloudflare with this rule enabled, robots.txt edits do not work. Disable the rule in the Cloudflare dashboard under Security → Bots. The robots.txt edit then takes effect.

// After this tutorial

What to do next.

If FinancialProduct schema is still missing from your token pages, run Tutorial 02. AI crawlers can now fetch the page, but schema is what makes the content citable.
Run Tutorial 04 (E-E-A-T schema) for the third high-impact fix.
For ongoing tracking of the AEO lift from this fix, run Tutorial 05.

// Related

Run a free Crawlux audit and apply this tutorial.

Run a free Crawlux audit and follow the tutorial sequence start to finish.

Free first audit · No signup · 60 seconds · Full PDF report

After state (what good looks like)

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "speakable": {
    "@type": "SpeakableSpecification",
    "cssSelector": [".faq-answer", ".quick-answer"]
  },
  "mainEntity": [
    {
      "@type": "Question",
      "name": "Is Aave safe to use?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Aave has been audited 12+ times..."
      }
    }
  ]
}
</script>

How to validate the fix

✓Schema.org Validator: 0 errors on the speakable property.
✓DOM check: every cssSelector in speakable resolves to actual DOM elements on the page.
✓AEO baseline test: capture citation rate before deploy across top 10 queries.
✓AEO post-deploy test: run same 10 queries 14 days after deploy. Expect 30-50% lift in citation count.
✓Voice test: ask Google Assistant a question your FAQ answers. Voice answer should pull from the Speakable-wrapped section.

Common pitfalls

Pitfall

Adding Speakable without FAQPage

Speakable can be added to Article or other types but works best paired with FAQPage. If your page doesn't have FAQPage yet, add it first. Speakable on a bare HTML page does little.

Pitfall

CSS selector that doesn't exist

If cssSelector points to .faq-answer but your DOM uses .accordion-content, Speakable resolves to nothing. Always validate that the selector actually matches DOM elements.

Pitfall

Pointing Speakable at the entire page

Don't use cssSelector: ["body"] or similar broad selectors. Speakable should point to specific answer-bearing elements. Broad selectors get ignored by parsers.

Pitfall

Speakable on pages without good answers

Speakable amplifies what's on the page. If your answers are vague or marketing-driven, Speakable amplifies vagueness. Make sure FAQ answers are concrete before adding Speakable.

Pitfall

Forgetting to update Speakable when content changes

If you redesign your FAQ block and change the class names, update the Speakable cssSelector. Stale selectors become invisible.

If something breaks: rollback

Remove the speakable property from FAQPage schema. Page falls back to regular FAQPage behavior within minutes. Citation rate may regress but no risk to site functionality.

Run a free Crawlux audit

Crawlux validates the schema, technical and AEO fixes from this tutorial automatically. Free tier on one domain.

Run free audit →

FAQ

Does Speakable work outside FAQPage?

Yes. Speakable can be added to Article, BlogPosting, NewsArticle and most schema types. The pattern is the same: SpeakableSpecification with cssSelector or xpath pointing to the most-quotable sections of the page.

Will Speakable affect Google rich results?

Speakable isn't in Google's primary rich result types yet but it's parsed and used for voice answers. The main beneficiary is AI engines (ChatGPT, Perplexity, Claude) which weight Speakable-marked content higher for citations.

Can I use Speakable for marketing copy?

Technically yes but it backfires. AI engines extract Speakable content verbatim. If your marketing copy is promotional, AI engines may extract it but flag it as biased. Use Speakable for factual answers, not marketing claims.

How specific should cssSelector be?

Specific enough to match only the answer-bearing elements. .faq-answer is good. .content is too broad. Use class names dedicated to the answer sections, not generic content classes.

Does Speakable have a length limit?

No formal limit but practical limit is 1-3 sentences per Speakable section. AI engines extract these as direct answers; longer than 3 sentences typically gets truncated. Optimize answers to 1-3 sentences for best extraction.

Pillar guides

Audit modules

RUN YOUR FIRST AUDIT

Run the tutorial against a real audit.

Get a free Crawlux audit report and use it as the baseline for the work in this tutorial.

Free first audit · No signup · 60 seconds · Full PDF report

Allow AI crawlers in robots.txt.

67% of crypto sites block at least one major AI crawler.

Check what your robots.txt currently does.

Edit robots.txt.

Deploy and re-check.

Wait for re-crawl, then re-audit.

If you use Cloudflare's "Block AI Bots" feature.

What to do next.

More reading.

Run a free Crawlux audit and apply this tutorial.

After state (what good looks like)

How to validate the fix

Common pitfalls

If something breaks: rollback

Run a free Crawlux audit

FAQ

Does Speakable work outside FAQPage?

Will Speakable affect Google rich results?

Can I use Speakable for marketing copy?

How specific should cssSelector be?

Does Speakable have a length limit?

Related tutorials

Pillar guides

Audit modules

Run the tutorial against a real audit.