llms.txt vs robots.txt: what crypto sites need in 2026
Two text files at the root of your domain do different jobs. robots.txt has been around since 1994 and controls crawler access. llms.txt is new in 2024 and tells AI engines what your site is about. You need both. Most crypto sites have neither configured correctly.
What each file actually does
robots.txt: a directive file that tells crawlers (search engines, AI bots, archivers) which paths they can fetch. It's a politeness protocol — crawlers respect it voluntarily. Hard-coded user-agent allow/disallow rules. Both Google and OpenAI publicly commit to respecting it.
llms.txt: a structured Markdown file that explains your site to AI engines. Like a sitemap, but human-readable and AI-targeted. Lists key pages, describes the product, links to canonical resources. Standard proposed by Anthropic and Jeremy Howard in 2024.
They serve different layers. robots.txt controls access. llms.txt provides context. You need access controlled correctly AND context provided cleanly. Skipping either one costs AEO visibility.
AI bot allowlist in robots.txt
By default, most CMSs ship a robots.txt that blocks AI bots inadvertently, or doesn't address them at all. Add explicit Allow directives for the bots you want to be indexed by.
# AI Crawlers (explicit allow)
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /The list above is the minimum viable set for English-market crypto sites in 2026. Add Bytespider (TikTok's crawler), Amazonbot, and Applebot-Extended if you serve audiences using those platforms' AI features.
Don't block AI bots on the assumption that 'training data' is a concern. The crawlers that pull content for real-time citation (PerplexityBot, ChatGPT-User, OAI-SearchBot) are NOT the same as the training crawlers (GPTBot, ClaudeBot). Blocking the citation crawlers makes you invisible to AI engines. Blocking the training crawlers is your call but doesn't affect citations.
llms.txt structure and content
llms.txt is plain Markdown at the root: https://yoursite.com/llms.txt. Structure follows the proposal: H1 with site name, blockquote with one-paragraph description, then sections of links.
# Your Protocol
> Your Protocol is a decentralized lending platform with $200M+ TVL across 5 chains. We pioneered isolation mode for risky assets.
## Core product
- [Supply markets](https://yourprotocol.com/markets/): list of all supplied assets with current APYs
- [Borrow markets](https://yourprotocol.com/borrow/): borrowing rates and collateral requirements
- [Documentation](https://docs.yourprotocol.com/): technical docs and integration guides
## Key topics
- [What is isolation mode](https://yourprotocol.com/blog/isolation-mode/)
- [How our oracle works](https://yourprotocol.com/blog/oracle-design/)
- [Security audits](https://yourprotocol.com/security/)Companion file: llms-full.txt at the same root. This is the long-form version with full text of your key pages inlined. AI engines that follow the spec can pull either llms.txt for the index or llms-full.txt for direct content. We host llms-full.txt at 14KB for Crawlux — small enough for an engine to fetch entirely.
Crypto-specific considerations
Two patterns specific to crypto sites.
Pattern 1: declare your token contract. If you have a token, the llms.txt should link to a canonical token page with the contract address. AI engines pull this when answering 'what's the contract address for <TOKEN>.' Without it, the engine guesses based on Etherscan results, and guesses wrong on tokens with multiple deployments.
Pattern 2: declare your supported chains. Crypto-specific queries ('does <protocol> support Solana') are frequent. List the supported chains explicitly in llms.txt as a section. Take this from your llms-full.txt content.
The robots.txt for a crypto site should also block /audit-report/, /checkout/, and any URL with a wallet address as a parameter. Wallet-addressed URLs leak user data into search indices.
Validation and monitoring
robots.txt validators. Google Search Console > Settings > robots.txt Tester. Paste a URL and confirm it's not accidentally blocked.
llms.txt validators. The community-maintained validator at llmstxt.org. Checks the structure follows the spec.
Live test. Ask ChatGPT 'what does <your site> do' two weeks after deploying llms.txt. If the answer quotes the description from your llms.txt, the file is being read. If the answer paraphrases your homepage h1, the engine hasn't picked it up yet.
Server log monitoring. Filter your access logs for User-Agent strings containing GPTBot, ClaudeBot, PerplexityBot. You should see hits within 48 hours of deploying llms.txt. No hits after a week suggests the file isn't reachable or your CDN is blocking the bots.
Common questions
Is llms.txt required?
No. It's an emerging standard, not a mandate. Sites without it can still get cited. Sites with it consistently get cited at higher rates in our tests.
Do all AI engines respect llms.txt?
Anthropic explicitly supports it. OpenAI hasn't formally committed but has been observed reading it. Google AI Overviews hasn't confirmed support. Treat it as best-effort, not guaranteed.
Can I have llms.txt without robots.txt?
Don't. robots.txt is required for any well-formed site. llms.txt is supplemental.
How often should I update llms.txt?
When you launch a major new feature or page. Otherwise quarterly. Keep it under 16KB so engines fetch it in one request.
Will llms.txt hurt my Google SEO?
No. Google ignores llms.txt. It's purely a signal to AI engines.
Audit your crypto site in 60 seconds
8-module deep scan. AI visibility, schema, technical SEO, backlinks. One domain free forever.
