Do AI engines like ChatGPT and Google actually read llms.txt?

Anthropic (Claude) and Perplexity have publicly indicated they read llms.txt and weight it in retrieval. OpenAI (ChatGPT and ChatGPT Search) does not officially confirm parsing the file but reads it incidentally when fetching a domain's root. Google (AI Overviews, Gemini) has not adopted llms.txt as a documented signal but its crawlers fetch the URL and third-party retrieval systems that surface inside Gemini via tool calls do parse it. The fastest-growing user base is enterprise RAG systems and second-tier engines (Mistral, Cohere, Glean, You.com, Phind, Kagi, custom LangChain stacks) that default to checking /llms.txt when ingesting a domain.

How is llms.txt different from robots.txt and sitemap.xml?

robots.txt controls which bots are allowed to crawl which URLs (a permission file). sitemap.xml lists every URL on the site for traditional search engines to discover and index (a discoverability file). llms.txt does neither — it is a curated context document that summarizes the site's purpose in human-readable markdown and lists the 20-40 highest-authority URLs an AI engine should retrieve and cite. robots.txt is for crawling, sitemap.xml is for indexing, llms.txt is for AI citation. The three files coexist and serve complementary purposes.

What is the difference between llms.txt and llms-full.txt?

llms.txt is the manifest — a short markdown file listing the site's purpose and key URLs with descriptions. llms-full.txt is the optional companion file that contains the full text of every page listed in llms.txt concatenated into a single document, so an AI engine can ingest the entire site as one read without crawling individual pages. Most sites only need llms.txt. Publish llms-full.txt if the site is documentation-heavy (developer docs, API references, technical knowledge bases) where answering a typical question requires reading multiple pages together. Keep llms-full.txt under 1 MB.

How do I write a good llms.txt file?

Follow the seven-step checklist: write an H1 site name and a one-sentence blockquote summary; add 1-2 context paragraphs that name the industry, audience, and topics the site is authoritative on; pick H2 sections that match the site type (Docs, Pages, Blog, Guides, Services); curate 5-15 of the highest-authority URLs per section instead of dumping everything; write entity-rich one-line descriptions for every link in the form `- [Title](full-URL): description`; put lower-priority links under a final ## Optional section; publish at /llms.txt with content-type text/markdown and verify it loads in a private browser window. Use the free llms.txt generator on this site to produce a spec-compliant first draft in 30 seconds, then edit before publishing.

Will publishing llms.txt actually improve my AI search visibility?

It will improve retrieval cost, framing control, and citation likelihood at the engines that parse it (Claude, Perplexity, and the growing long tail of RAG systems). It will not, by itself, get a brand cited inside ChatGPT or Google AI Overviews — those engines weight tier-1 editorial citations, complete schema, and Wikidata entries far more heavily. llms.txt is one of seven AEO/GEO infrastructure layers; publishing it is cheap insurance that compounds as more engines adopt the spec, but it is not a substitute for the other six layers. The right framing: necessary, not sufficient.

How often should I update my llms.txt file?

Re-audit quarterly. Update immediately whenever the site reorganizes URLs, launches a new pillar page or major service, retires a service, or rebrands. Stale llms.txt files are worse than missing files because they actively misdirect AI engines toward URLs that no longer exist or no longer represent the brand. Set a recurring calendar reminder, treat the file as living infrastructure rather than a one-time deployment, and verify after every site migration that the path /llms.txt still resolves and the content-type header is still correct.

Why is my llms.txt being ignored by AI engines?

Eight common failures cause AI engines to silently skip an llms.txt file: relative URLs instead of fully qualified ones; bullet links missing the colon-separated description; nested headings (H3 or H4) that violate the flat spec; HTML tags inside the markdown; dumping every site URL instead of curating 20-40; stale links pointing at moved or deleted pages; wrong content-type header (application/octet-stream or a forced download); and aggressive Cloudflare or Akamai WAF rules blocking AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot) before they ever reach the file. Fix the file format first, then audit the CDN allowlist.

Complete Guide

llms.txt Explained: How ChatGPT, Claude & Perplexity Use It

Smart Money Media Team21 min readUpdated Jul 6, 2026

llms.txt is one of the most under-deployed pieces of AI search infrastructure on the modern web — but it is not a Google Search ranking factor. Proposed in September 2024 as a robots.txt-style manifest for large language models, the file lives at the root of a website (/llms.txt), summarizes what the site is authoritative on, and gives AI crawlers a structured map of the pages worth citing. ChatGPT, Perplexity, and Claude are the AI engines most likely to use it today; Google has publicly stated that llms.txt is not used in Google Search or AI Overviews. This guide explains what llms.txt is, which engines actually read it, the exact format the spec requires, and how to publish one for your own site today.

Watch the explainer, or keep reading for the full guide.

Quick Summary

A complete reference on llms.txt — the proposed AI manifest that tells ChatGPT, Claude, and Perplexity what a site is about and which pages to cite. Covers the spec, which engines actually read it (and Google's official position that Search does not), the difference between llms.txt and llms-full.txt, real published examples, the seven-step checklist for writing one that AI engines actually use, common mistakes that get the file ignored, and how llms.txt fits inside a full Answer Engine Optimization (AEO) and Generative Engine Optimization (GEO) program.

What is llms.txt?

llms.txt is a plain-text markdown file placed at the root of a website (/llms.txt) that tells large language models what the site is about, who it serves, and which canonical URLs they should retrieve and cite when answering user questions in the site's subject area. The format was proposed by Jeremy Howard of Answer.AI in September 2024 and is documented at llmstxt.org. It is the AI-search analog to robots.txt (which controls crawling) and sitemap.xml (which lists URLs for traditional search engines), but solves a different problem: helping LLMs understand context and authority rather than discoverability alone.

The file is intentionally small, human-readable, and parseable in a single LLM context window. Where a sitemap might list 10,000 URLs in an XML structure designed for crawlers, an llms.txt lists the 20-40 pages a model actually needs to ground a quality answer about the site, each with a one-line description of what it is. That curation is the entire point: the file is a brand-controlled retrieval index, not a complete inventory.

The three files are easiest to understand side by side:

File	Audience	Purpose	Format	Typical Size
`/robots.txt`	Traditional + AI crawlers	Permission rules — which bots may crawl which paths	Plain text directives	< 5 KB
`/sitemap.xml`	Google, Bing, traditional search	Discoverability — full inventory of every indexable URL	XML	Up to 50 MB / 50K URLs per file
`/llms.txt`	LLMs and RAG systems (Claude, Perplexity, enterprise AI)	Citation — curated context + the 20-40 URLs worth citing	Markdown	1-5 KB

Key Takeaway: llms.txt is a markdown manifest at /llms.txt that gives AI engines a curated, one-screen map of the pages a site is authoritative on — robots.txt is for crawling, sitemap.xml is for indexing, llms.txt is for AI citation.

llms.txt is one of the tactical layers inside the broader LLM SEO discipline — the umbrella practice of engineering brand visibility across ChatGPT, Perplexity, Claude, and Gemini. For the broader discipline of being cited inside AI-generated answers, see the Zero-Click Marketing pillar guide and the AEO agency service page.

Does Google Use llms.txt? (The Honest Answer)

No. Google has publicly stated that llms.txt is not a ranking input for Google Search or AI Overviews — publishing one will not directly affect how Google indexes, ranks, or cites your site. In its June 2026 AI optimization guide for Search, Google wrote verbatim: "You don't need to create new machine readable files, AI text files, markup, or Markdown to appear in generative AI search." That is the official position and it should be taken at face value. Anyone selling llms.txt as a Google ranking hack is selling something Google has explicitly disclaimed.

That said, "Google does not use it" is not the same as "no AI engine uses it." llms.txt is a real, evolving piece of infrastructure for the non-Google AI ecosystem:

Anthropic (Claude) publishes its own /llms.txt and has acknowledged the format in developer documentation.
Perplexity crawls and surfaces llms.txt content during retrieval for site-grounded answers.
Enterprise RAG systems built on LangChain, LlamaIndex, and custom pipelines routinely use llms.txt as a curated ingestion source for grounded internal search.
Bing / Copilot has not officially endorsed the spec but does fetch the file on many domains.
ChatGPT (OpenAI) has not officially confirmed using llms.txt for ranking but the ChatGPT browse tool will retrieve and parse it on demand when answering site-specific questions.

Key Takeaway: llms.txt is for the non-Google AI ecosystem — Claude, Perplexity, ChatGPT browse, Bing Copilot, and enterprise RAG. For Google Search and Google AI Overviews specifically, the file does nothing and should not be sold as if it does.

Two practical implications follow. First, llms.txt is cheap to publish and worth doing — but it is a coverage play for the non-Google share of AI traffic, not a Google SEO tactic. Second, the things that DO drive visibility inside Google AI Overviews are the same things that drive traditional Google rankings: unique, expert content; clean technical structure; semantic HTML; and earned third-party citations from authoritative publications. That is the discipline our Authority Buildout program runs.

Why llms.txt Matters Right Now

AI engines have replaced the search results page as the first surface where most B2B research, vendor evaluation, and brand discovery now happens — and those engines weight curated, structured signals from a brand's own domain more heavily than they weight scraped content from the open web. The Reuters Institute predicts search engine traffic could fall by over 40% in three years as AI Overviews and conversational AI engines intercept queries before users reach a results page. Inside that shift, the brands that hand AI engines a clean, accurate context document have a structural advantage over brands that leave the engines to guess.

Three concrete things change when a site publishes a working llms.txt:

Retrieval cost drops, citation likelihood rises. When an AI engine builds an answer, it pulls candidate documents into context. A 2 KB llms.txt is dramatically cheaper to retrieve and parse than a JavaScript-rendered homepage, and the engine gets a higher signal-to-noise ratio per token. Cheaper retrieval correlates with more frequent citation.
The brand controls the framing. Without llms.txt, the engine assembles its own summary of what a site is from whatever pages it happens to have scraped — sometimes outdated, sometimes wrong. With llms.txt, the brand writes the one-sentence summary the engine reads first.
Pillar content gets surfaced over noise. Most sites have a handful of pages worth citing (pillar guides, services, About) and a long tail of pages that dilute authority (tag archives, paginated category pages, old blog posts). llms.txt is the brand's chance to point AI engines at the right 20 URLs.

The file does not guarantee citation — no AI-engine input does — and several major models do not yet officially read it. But the cost of publishing is roughly one hour of work and the upside is a permanent piece of AI-search infrastructure that compounds as more engines adopt the spec.

Key Takeaway: llms.txt is cheap insurance on the AI-search transition — sites without it leave their framing, their canonical URLs, and their citation likelihood to whatever the engine happens to scrape.

The Exact llms.txt Format (Per llmstxt.org)

The llms.txt specification is strict on structure and intentionally flat — no nested headings beyond H2, no HTML, no code fences, just markdown a language model can parse in one pass. A spec-compliant file has six elements, four of them optional but all of them recommended for a production deployment:

H1 site name — required. The only required element in the spec. Example: # Smart Money Media
Blockquote summary — a single line starting with > that summarizes what the site is in one sentence. Highly recommended.
Context paragraphs — one or two short markdown paragraphs explaining the site, the audience, and what makes it authoritative. No headings inside this block.
H2 section headings — group related links under section names like ## Docs, ## Pages, ## Blog, ## Examples, or ## Resources. Use only the sections that apply.
Markdown link lists — each H2 section contains a bullet list in the exact form - [Title](full-URL): one-line description. The colon-description pattern is what AI engines parse.
## Optional — the final H2 section. Lists lower-priority links that AI engines with a tight context budget can safely skip. This is the spec's signal for "you can drop these if you need to."

A minimal compliant example:

# Acme Invoicing

> Automated invoice reconciliation for mid-market finance teams.

Acme Invoicing is a B2B SaaS used by 1,200+ controllers to match bank deposits to open invoices automatically. Founded 2021. SOC 2 Type II.

## Docs

- [Getting Started](https://acme.com/docs/getting-started): 15-minute setup guide for new accounts.
- [API Reference](https://acme.com/docs/api): REST API for bulk reconciliation and webhooks.

## Pages

- [Pricing](https://acme.com/pricing): Per-seat pricing tiers and enterprise options.
- [Security](https://acme.com/security): SOC 2 Type II, encryption, and data residency policies.

## Optional

- [Changelog](https://acme.com/changelog): Weekly product updates.
- [Status](https://status.acme.com): Real-time uptime and incident history.

The two non-negotiables: every link is fully qualified (no relative paths) and every link has a colon-separated description. AI engines parse the description, not the URL, when deciding whether to retrieve.

Key Takeaway: The spec is strict and short — H1, blockquote summary, context paragraphs, H2 sections with - [Title](url): description bullets, and a final ## Optional section. No nesting, no HTML, no creative interpretations.

How AI Engines Actually Use llms.txt

The llms.txt spec is a proposal, not a standard, and the major AI engines have adopted it at different rates and in different ways — knowing which engines parse the file changes how much effort is worth investing in it. As of 2026, the practical picture is mixed but trending in one direction.

Anthropic (Claude) and Perplexity have publicly indicated they read llms.txt when present and use it to inform retrieval. Perplexity in particular treats it as a first-pass index for "site:" style queries about a brand.
OpenAI (ChatGPT, ChatGPT Search) does not officially confirm parsing llms.txt, but the file is small enough that any engine doing site-level retrieval reads it incidentally when it requests the root of the domain.
Google (AI Overviews, Gemini) has not adopted llms.txt as a documented signal. Google's AI surfaces lean on the existing structured-data ecosystem (schema, sitemap, Knowledge Graph). However, Google crawlers do fetch /llms.txt requests, and the file is read by third-party retrieval systems that surface inside Gemini answers via tool calls.
The expanding long tail — Mistral, Cohere, Glean, You.com, Phind, Kagi Assistant, and a fast-growing list of enterprise RAG systems built on top of LangChain, LlamaIndex, and similar frameworks default to checking /llms.txt when ingesting a domain. This is where the compounding value lives.

Practically, treat llms.txt the same way a sensible operator treated schema.org in 2014: not every engine uses it today, the engines that do use it weight it modestly, but the file is cheap to publish and the cost of being late to a standard that becomes universal is far higher than the cost of being early.

Key Takeaway: Claude and Perplexity actively use llms.txt today. ChatGPT and Gemini do not officially, but read it incidentally. The real prize is the expanding long tail of enterprise RAG systems and second-tier engines that default to checking it.

Seven-Step Checklist for Writing an llms.txt That Engines Actually Use

Most llms.txt files in the wild today are either auto-generated junk (every URL on the site dumped into one bucket) or so terse they convey no useful context — both get ignored by engines that have learned to weight high-signal manifests over high-noise ones. The checklist below is the seven-step process Smart Money Media uses for client deployments.

Step 1: Write the H1 and blockquote as if to a brand-new analyst. The H1 is the site name. The blockquote is one sentence — what the site is, who it serves, what makes it different. Read it back to yourself: if it could describe four other companies, rewrite it.

Step 2: Add one or two context paragraphs that include the brand's key entities. Mention the industry, the audience, founding year, notable certifications or partnerships, and the two or three topics the site is most authoritative on. AI engines use this block to decide whether the site is a relevant retrieval source for a given query.

Step 3: Pick the H2 sections that match the site type. A SaaS site usually has ## Docs, ## Pages, ## Blog. A media site usually has ## Pillar Guides, ## Categories, ## Glossary. A services agency usually has ## Services, ## Guides, ## Case Studies. Use the names that match how a reader would describe the site.

Step 4: Curate, do not enumerate. Each section should contain 5-15 of the highest-authority URLs in that category. A 2 KB file with 25 carefully chosen links outperforms a 200 KB file with every URL on the site — the goal is retrieval relevance, not coverage.

Step 5: Write descriptions that include entities, not adjectives. "Comprehensive guide to PR strategy" is useless. "Step-by-step framework covering goal-setting, audience segmentation, media relations, and ROI measurement for B2B PR programs" is what an AI engine can match to a user query.

Step 6: Put low-priority links under ## Optional. Changelog, status page, RSS feed, legal pages — anything that is real content but should be the first thing dropped if the engine has a tight context budget.

Step 7: Publish at /llms.txt with the correct headers. The file must be served at the exact path /llms.txt with Content-Type: text/markdown (or text/plain). Verify it loads in a private browser window, then test it with at least one AI engine by asking a brand-specific question and watching whether the engine retrieves URLs from your file.

Key Takeaway: Spec compliance is only the first 30 percent of a useful llms.txt — the rest is curation, entity-rich descriptions, and serving the file at the right path with the right content type.

The fastest way to get a compliant first draft is the free llms.txt generator on this site — paste your URL, the tool pulls your sitemap and homepage, and an AI composes a spec-compliant file in 30 seconds. Edit before publishing.

llms.txt vs llms-full.txt: When to Publish Both

The spec defines a companion file, llms-full.txt, which is the full-text concatenation of every page listed in llms.txt — designed for AI engines that want to ingest an entire site as a single document without crawling individually. Most sites do not need it. Sites that benefit are documentation-heavy products (developer docs, API references, technical knowledge bases) where the AI engine's primary use case is answering "how do I do X with your product" questions and the answer requires reading multiple pages together.

Decision rule: publish llms-full.txt if the site has 30+ pages of evergreen technical content that an AI engine would routinely need to read together to answer a question. Skip it if the site is a brand site, an agency site, a marketing site, or a blog — for those, llms.txt alone gives the engine what it needs to retrieve the right individual page on demand.

When publishing both, the convention is to list llms-full.txt as a single bullet under llms.txt's primary section so the engine can find it without crawling. Keep the file under 1 MB; engines with smaller context windows ignore files above that threshold.

How to Budget an llms.txt for a 100K-Token Context Window

Even when an AI engine will happily ingest a 100,000-token file, burying your most important brand facts on line 40,000 guarantees they get deprioritized — LLMs weight tokens near the top of a context window far more heavily than tokens buried in the middle. The fix is context budgeting: decide in advance which facts get the first 2,000 tokens, which get the next 15,000, and which live in the tail.

The three-tier structure Smart Money Media uses on client deployments:

Tier 1 — First 2,000 tokens (mandatory). Core brand definition, one-line UVP, category positioning, founding year, and the two or three topics the site is authoritative on. This is what the engine reads before it decides whether your site is a relevant retrieval source at all.
Tier 2 — Next 15,000 tokens (contextual). Primary service or product descriptions, pillar-guide links with entity-rich descriptions, FAQ blocks that mirror real buyer questions, and any proprietary methodologies you want the engine to attribute to you by name.
Tier 3 — Remaining budget (supporting). High-performing evergreen blog posts, redacted case studies, industry perspective pieces, and glossary-style definitional content. Anything the engine can safely truncate without losing the brand's core framing.

Two operating rules follow. First, if your combined file exceeds the engine's context window, retrieval is not truncated politely — it is truncated arbitrarily, often mid-sentence. Second, most engines apply attention decay across long contexts, so the last 10 percent of a maxed-out file is functionally invisible even when the engine claims to have read it. Aggressive editing beats sheer volume every time.

Key Takeaway: Treat the first 2,000 tokens as your one and only chance to define the brand — Tier 2 gets services and pillars, Tier 3 gets everything else. Length without hierarchy is invisible to the engine.

What to Exclude From Your llms.txt (Security and Reputation Guardrails)

Auto-generators that concatenate an entire sitemap into llms.txt or llms-full.txt routinely pull content that should never be handed to an AI engine — exposed API endpoints, staging URLs, executive personal contact details, and unredacted client data all leak this way. A curated exclusion list is not optional for any production deployment.

Content Vector	What Good Looks Like	Common Destructive Mistake
Product documentation	Clean markdown explaining capabilities, use cases, and public API surface	Accidental inclusion of internal API endpoints, admin routes, or leaked developer keys
Executive information	High-level biographies and public-facing thought leadership	Exposing direct cell numbers, personal emails, or home-city references pulled from bios
Sales narratives	Neutral, educational comparison copy with verifiable claims	Aggressive "buy now" language that AI engines filter as promotional spam
Customer evidence	Anonymised, statistically sound case-study outcomes	Leaking sensitive client usage data, contract terms, or logo permissions without prior redaction
Environments	Production URLs only, served with correct canonical tags	Staging subdomains, preview URLs, and internal QA paths concatenated into the file

Bake the exclusion list into whatever script generates the file — do not rely on manual review of a 30,000-line output. A single explicit deny-list in the build step (staging subdomains, /admin, /internal, /api/private, user-generated comment feeds, any URL containing customer identifiers) prevents almost every real-world leak we have audited.

Key Takeaway: Treat llms.txt as a curated public library, not a database dump. An explicit exclusion list in the generator script — staging, admin, private API paths, PII, and unredacted client data — is mandatory for any production deployment.

Real-World llms.txt Examples (And What They Get Right)

The fastest way to write a good llms.txt is to study the ones already in production at companies whose AI-search results are visibly working — and to copy the structural choices, not the content. Four worth dissecting:

Anthropic (anthropic.com/llms.txt). The company that builds Claude publishes a minimal, ruthlessly curated file. One H1, one blockquote, three H2 sections (Docs, API, Research), roughly 30 links total. Every description is a short clause that names the topic and the audience. The lesson: even the company writing the consuming model keeps the file under 5 KB.

Cloudflare (cloudflare.com/llms.txt). Cloudflare's file is heavier on docs and API references because the company's AI-search use case is developer questions. Sections are organized by product (Workers, R2, D1, Pages), each with 10-20 of the highest-traffic doc URLs. The lesson: section structure should match how users ask questions about you, not how your nav is organized.

Hugging Face (huggingface.co/llms.txt). The model hub publishes a much longer file because its AI-search use case is genuinely "give the engine the whole library." This is one of the rare cases where llms-full.txt also makes sense. The lesson: file length should match the breadth of evergreen content, not a one-size rule.

Smart Money Media (smartmoneymedia.org/llms.txt). Our own file is the agency-site template: H2 sections for Services, Pillar Guides, Free Tools, and Blog, with 5-15 links each and entity-rich descriptions that name the AEO/GEO/PR concepts each page covers. View it directly to see the agency pattern in action.

Key Takeaway: Study three or four published files in your category before writing yours — the section structure that wins is the one that matches how prospects actually ask AI engines about you, not the one that mirrors your sitemap.

How to Test Whether AI Engines Are Actually Reading Your llms.txt

Publishing the file is only the first half — the second half is verifying that AI engines fetch it, parse it, and surface the URLs it points to when prospects ask brand-related questions. Six tests, in order of effort:

Fetch test. In a private browser window, request https://yourdomain.com/llms.txt. Confirm it returns HTTP 200, the body is your markdown, and the response header is Content-Type: text/markdown or text/plain — not application/octet-stream and not a forced download.
Crawl-log test. Filter your server or CDN access logs for requests to /llms.txt over the last 30 days. You should see hits from GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot, and Applebot. If you see zero, your WAF is almost certainly blocking them — allowlist GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot, and Applebot at the CDN layer, and place the allowlist skip rule above any country, bot-score, or rate-limit block rule.
Brand-query test in Perplexity. Ask Perplexity a brand-specific question ("What does [your brand] do?" or "Does [your brand] offer [your specific service]?"). Click into the sources. If the URLs cited are pages listed in your llms.txt, the file is influencing retrieval. If Perplexity is citing random scraped pages from your domain, your descriptions are not entity-rich enough.
Brand-query test in Claude. Repeat the same question in Claude with web search enabled. Claude is one of the engines that most actively reads llms.txt, so weak retrieval here usually means a format problem in the file itself.
Site-specific URL test. Ask an engine for a specific URL: "What is the Smart Money Media zero-click marketing guide about?" If the engine retrieves the correct URL with a summary close to the description in your llms.txt, the file is being parsed correctly. If the engine pulls a different URL or hallucinates a summary, your descriptions need rewriting.
Free AI Visibility Audit. Run the free AI Visibility Audit on your domain. The audit explicitly tests whether your llms.txt exists, parses, and is being retrieved by major engines — and surfaces the gaps in the surrounding six AEO/GEO layers.

Key Takeaway: Publishing is not verifying. Run the six-test sequence at least once a quarter — fetch, crawl-log, two brand queries, one URL-specific query, and an automated audit — to confirm the file is doing what you published it for.

Industry-Specific llms.txt Patterns

A generic template will produce a generic llms.txt that gets generic results — the highest-performing files are tuned to the questions AI engines actually receive about that specific industry. Four common patterns:

SaaS products. Lead with ## Docs and ## API sections because developer questions ("how do I integrate X with your product") dominate AI-engine traffic for SaaS brands. Then ## Pages (pricing, security, integrations), then ## Blog as ## Optional. Publish llms-full.txt if the doc set is large enough that answering a typical question requires multiple pages.

Agencies and professional services. Lead with ## Services using one bullet per service with a description that names the deliverables and the target client. Follow with ## Pillar Guides (the proof of expertise), ## Case Studies (the proof of results), and ## About. Skip llms-full.txt — agency content is too varied to concatenate usefully.

Ecommerce. Lead with ## Categories rather than individual product URLs, because AI engines answering "where do I buy X" cite category and comparison pages far more often than individual SKUs. Follow with ## Buying Guides, ## Brand (sustainability, shipping, returns policy — the trust signals), and ## Optional for the changelog of new arrivals.

Media and publishing. Lead with ## Pillar Guides and ## Categories. Each category gets one bullet linking to the category page, not the individual articles — let the engine crawl from there. Add a ## Glossary section if the site publishes definitional content; this is high-value retrieval bait for AI engines answering "what is X" questions. Skip llms-full.txt.

Key Takeaway: Match the section ordering to the AI-engine query types your industry actually receives — developer questions for SaaS, services questions for agencies, category questions for ecommerce, definitional questions for media.

Common Mistakes That Get llms.txt Ignored

The most common llms.txt failure is not absence — it is publishing a file that AI engines silently skip because it violates the format or contains low-signal content. The mistakes below recur on roughly 60 percent of the llms.txt files we audit on prospect domains.

Relative URLs. Links like - [About](/about): ... fail because the engine reading the file may not know which domain it is on. Every link must be fully qualified: https://yourdomain.com/about.
Missing descriptions. A bullet that reads - [Pricing](https://example.com/pricing) with no colon-separated description is parseable as a link but conveys zero relevance signal to the engine.
Nested headings. Some auto-generators emit H3 or H4 inside H2 sections. The spec is flat. Engines parsing the file ignore nested headings or, worse, treat the file as malformed and skip it entirely.
HTML inside the file. Tables, divs, anchor tags, and inline styling break the parser. llms.txt is markdown, not HTML.
Dumping every URL. An llms.txt with 2,000 links is not a curated manifest — it is a sitemap with bad formatting. The whole point is the curation.
Stale links. An llms.txt published in 2024 and never updated when the site reorganized its URLs is worse than no file at all because it actively misdirects the engine. Re-audit quarterly.
Wrong content-type header. Some hosts serve /llms.txt as application/octet-stream or force a download. The correct header is text/markdown or text/plain; AI engines fetching the URL with a download header may discard the response.
Blocked by robots.txt or CDN WAF rules. Aggressive bot-blocking rules at the CDN layer (Cloudflare, Akamai, Fastly) sometimes block AI crawlers from reaching the file. Allowlist GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and CCBot by user agent, and make sure the allowlist Skip rule sits above any country-block, bot-score, or rate-limit rule in the WAF evaluation order.

Key Takeaway: The biggest llms.txt failures are silent — relative URLs, missing descriptions, nested headings, and CDN bot-blocking all cause AI engines to skip the file without telling anyone. Audit the file as if a stranger is reading it.

How llms.txt Fits Inside a Full AEO and GEO Strategy

Publishing llms.txt is necessary but not sufficient — the file is one of seven infrastructure layers that determine whether a brand gets cited inside ChatGPT, Perplexity, Claude, and Google AI Overviews answers. The full stack, in priority order:

Tier-1 editorial citations. Earned coverage in publications AI models trust (Forbes, Bloomberg, Reuters, TIME, industry-specific tier-1 outlets) is the single highest-weighted signal across every major AI engine. No technical optimization substitutes for being the brand that journalists at trusted outlets actually cite. See the media placements guide.
Complete schema markup. Organization, Service, FAQPage, Article, and BreadcrumbList schema on every relevant page. Schema is how AI engines confirm the entities on the page match the entities in their knowledge graph.
Wikidata and Knowledge Graph entries. The brand needs an active Wikidata entry with sameAs links to its real social profiles and a populated Google Knowledge Panel. Without these, the engine cannot anchor the brand as a real entity.
llms.txt manifest. The curated retrieval index covered in this guide.
AI-extractable content formatting. KEY TAKEAWAY blocks, definitional sentences immediately under question-based H2 headings, FAQ sections with FAQPage schema, and answer-first writing structure.
Open Graph metadata with branded social cards. AI engines increasingly retrieve OG metadata when summarizing a URL. Branded, accurate OG cards reinforce the entity.
CDN allowlist for AI crawlers. If the AI engine cannot fetch the page, none of the other six layers matter. Allowlist GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and CCBot at the WAF.

llms.txt sits at layer four because it amplifies the work in the other six layers but does not substitute for any of them. A brand with no editorial citations, no schema, no Wikidata entry, and no AI-extractable content will not be cited regardless of how perfect its llms.txt is. A brand with all six other layers and no llms.txt is leaving compounding upside on the table.

Key Takeaway: llms.txt is layer four of a seven-layer AEO/GEO stack — necessary but not sufficient. Pair it with tier-1 citations, schema, Wikidata, AI-extractable content, branded OG cards, and a CDN crawler allowlist.

For the full stack walkthrough, see the Zero-Click Marketing pillar guide. To find out where your own brand stands on all seven layers right now, run the free AI Visibility Audit.

Frequently Asked Questions

Common questions about llms.txt.

If You're Invisible in AI, You're Losing Clients Right Now.

See exactly how your company appears across AI, search, and investor research — and uncover the hidden gaps costing you trust and deals.

Latest llms.txt Articles

Fresh insights and tactical deep-dives published in the llms.txt cluster.

SEO & Content

7 B2B SEO Strategies for AI-Driven Search

Developing effective seo strategies for ai-driven search in today's market stops competitors from stealing your traffic. Learn how to engineer AI citations.

Jul 6, 202614 min

SEO & Content

llms.txt Example: Copy-Paste Template (2026)

Discover how a properly structured llms.txt example ensures autonomous artificial intelligence models correctly ingest and cite your most valuable brand assets.

Jun 29, 202621 min

SEO & Content

Mastering llms.txt vs robots.txt for ai crawler compliance

Navigating digital visibility requires mastering llms.txt vs robots.txt for ai crawler compliance. Learn how to secure your data and boost brand authority.

Jun 26, 202615 min

SEO & Content

llms.txt for AI Citations: What Actually Works

Discover how an llms.txt file maps your website for AI agents and RAG systems, helping you build verifiable brand authority in an era of zero-click search.

Jun 22, 202613 min

PR Strategy

PR Opportunities: Your Strategic Blueprint for Brand

Finding the right PR opportunities is the difference between invisible brands and industry leaders. Learn how to turn media placements into lasting authority.

Jun 16, 202615 min

SEO & Content

llms.txt Generator: The Complete Guide for AI Visibility

Are generative AI engines actually reading your content? Learn how an llms.txt generator structures data and whether it improves your brand's visibility.

Jun 13, 202614 min

Browse all articles

What is llms.txt?

Does Google Use llms.txt? (The Honest Answer)

Why llms.txt Matters Right Now

The Exact llms.txt Format (Per llmstxt.org)

How AI Engines Actually Use llms.txt

Seven-Step Checklist for Writing an llms.txt That Engines Actually Use

llms.txt vs llms-full.txt: When to Publish Both

How to Budget an llms.txt for a 100K-Token Context Window

What to Exclude From Your llms.txt (Security and Reputation Guardrails)

Real-World llms.txt Examples (And What They Get Right)

How to Test Whether AI Engines Are Actually Reading Your llms.txt

Industry-Specific llms.txt Patterns

Common Mistakes That Get llms.txt Ignored

How llms.txt Fits Inside a Full AEO and GEO Strategy

Frequently Asked Questions

What is llms.txt and where does it go?

Do AI engines like ChatGPT and Google actually read llms.txt?

How is llms.txt different from robots.txt and sitemap.xml?

What is the difference between llms.txt and llms-full.txt?

How do I write a good llms.txt file?

Will publishing llms.txt actually improve my AI search visibility?

How often should I update my llms.txt file?

Why is my llms.txt being ignored by AI engines?

If You're Invisible in AI, You're Losing Clients Right Now.

Latest llms.txt Articles

7 B2B SEO Strategies for AI-Driven Search

llms.txt Example: Copy-Paste Template (2026)

Mastering llms.txt vs robots.txt for ai crawler compliance

llms.txt for AI Citations: What Actually Works

PR Opportunities: Your Strategic Blueprint for Brand

llms.txt Generator: The Complete Guide for AI Visibility