PerplexityBot
PerplexityBot is Perplexity AI's web crawler, used to fetch pages in real time when answering user queries. Unlike training crawlers, PerplexityBot operates almost entirely on a retrieval-at-answer-time basis, which means Perplexity citations depend more on whether the bot can fetch the page right now than on long-term training-data inclusion. Why it matters: Perplexity is the highest-citation-density AI search engine — almost every answer includes inline source links. Allowing PerplexityBot and ensuring the page renders correctly to bots (no aggressive Cloudflare blocks, no JS-only content for crawlers) is a prerequisite for capturing Perplexity traffic.
Learn more:
→ llms.txt GuideRelated Terms
An AI search engine is an advanced search platform powered by artificial intelligence that fundamentally shifts the search experience from a list of links to conversational, synthesized answers. Unlike traditional search engines, these platforms (such as Google Gemini, Microsoft Copilot, Perplexity, and even integrated AI features like ChatGPT Search) generate comprehensive responses, often citing multiple sources, rather than merely pointing to web pages. Why it matters: This paradigm shift means that for a brand's information to be included or cited, its content must exhibit strong entity signals, demonstrate high authority and factual accuracy, and be structured in a way that AI models can easily process and trust. The goal is to be a primary 'ingredient' in these AI-generated answers, rather than just a link on a results page. For example, a user asking "What are the benefits of [Brand X's] new service?" expects a direct answer citing the brand's official statements or authoritative reviews, not just a list of links to articles about it.
Perplexity AIPerplexity AI is an innovative AI-powered search engine designed to provide direct, cited answers to user queries by synthesizing information from multiple authoritative web sources. Unlike traditional search engines that mostly return lists of links, Perplexity aims to summarize and explain, often including direct quotes and links to the original sources it consulted to generate its response. Why it matters: For reputation management and SEO, being cited by Perplexity AI is a powerful indicator of authority and trustworthiness. Brands with strong topical authority, high-quality content, and well-structured data (like schema markup) are significantly more likely to be referenced in Perplexity's answers. This platform represents a key frontier in AI search, where content discoverability depends on being a primary source recognized by advanced AI systems.
Robots.txtThe robots.txt file is a plain text file placed in a website's root directory that provides instructions to search engine crawlers and AI bots about which pages or sections of the site they are permitted or forbidden to crawl. It uses the Robots Exclusion Protocol to communicate directives like 'Disallow' (block crawling) and 'Allow' (permit crawling) to specific user agents. Why it matters: Strategic robots.txt configuration is essential for managing crawl budget, protecting sensitive pages from indexing, and — increasingly — controlling which AI training bots can access your content. For brands focused on AI search visibility, selectively allowing citation-focused bots (like ChatGPT-User and PerplexityBot) while blocking training-only crawlers (like GPTBot and CCBot) ensures your content is available for AI-generated citations without being used for unattributed model training. This nuanced approach to bot management is becoming a critical component of modern SEO and content protection strategy.
AI SearchAI search is the broad category of search experiences powered by artificial intelligence and large language models, where users receive synthesized, conversational answers instead of (or alongside) traditional lists of links. This includes Google AI Overviews, ChatGPT Search, Perplexity, Microsoft Copilot, and Gemini. Why it matters: AI search has shifted the SEO playing field. Ranking on page one of Google is no longer enough — brands must also be cited by AI models when users ask questions about their industry, products, or expertise. AI search systems prioritize sources with strong entity signals, consistent brand mentions across authoritative sites, structured data, and content that directly answers user intent. Optimizing for AI search means building digital authority through PR, earning media mentions, implementing schema markup, and creating content that AI models can easily understand, trust, and reference in their generated responses.
llms.txtllms.txt is a proposed plain-text file placed at the root of a website (e.g. /llms.txt) that summarizes the site's purpose, lists its most important canonical URLs, and provides AI crawlers with a compact, structured map of what the site is authoritative on. It is the AI-engine analog to robots.txt and sitemap.xml, designed specifically to help large language models index, ground, and cite the right pages. Why it matters: As ChatGPT, Perplexity, Claude, Google AI Overviews, and Bing Copilot increasingly drive discovery, llms.txt is becoming a meaningful AEO and GEO infrastructure layer. A well-crafted llms.txt tells AI engines exactly which pillar guides, services, and authoritative resources to cite when answering questions in the brand's domain — reducing the risk of being misrepresented or omitted. Sites without llms.txt are not penalized, but sites with a clean, accurate llms.txt give themselves a structural advantage in AI citation outcomes. Smart Money Media's own llms.txt is publicly available at /llms.txt, and any site can generate a spec-compliant file in 30 seconds with our free llms.txt generator at /tools/llms-txt-generator.
GPTBotGPTBot is OpenAI's web crawler used to gather training data for future GPT models. Site owners control GPTBot access via robots.txt — allowing it permits OpenAI to use the site's content in training, while disallowing it excludes the site. GPTBot is distinct from OAI-SearchBot (which fetches pages live for SearchGPT/ChatGPT Search answers) and ChatGPT-User (which fetches a page when a user pastes a URL into ChatGPT). Why it matters: Allowing GPTBot increases the long-term probability that ChatGPT can recall and cite a brand from training memory without needing a live web fetch. For most B2B brands, allowing all three OpenAI user agents is the correct default.