Skip to main content

    Crawl Budget

    The number of pages a search engine will crawl on your site within a given timeframe. Large sites must optimize crawl budget by eliminating duplicate pages, fixing broken links, and using XML sitemaps to ensure important pages get discovered and indexed. Why it matters: For SEO, an inefficient crawl budget means search engines might miss critical pages, impacting their ability to rank. This is especially relevant for large websites with thousands of pages. If a search engine spends too much time crawling low-value, duplicate, or broken pages, it might not crawl important content like new product launches or high-value thought leadership articles, delaying their visibility in search results and in AI search models. Managing crawl budget is essential to ensure that SEO and PR efforts — particularly around new content creation — are not hampered by technical inefficiencies.

    Related Terms

    Robots.txt

    The robots.txt file is a plain text file placed in a website's root directory that provides instructions to search engine crawlers and AI bots about which pages or sections of the site they are permitted or forbidden to crawl. It uses the Robots Exclusion Protocol to communicate directives like 'Disallow' (block crawling) and 'Allow' (permit crawling) to specific user agents. Why it matters: Strategic robots.txt configuration is essential for managing crawl budget, protecting sensitive pages from indexing, and — increasingly — controlling which AI training bots can access your content. For brands focused on AI search visibility, selectively allowing citation-focused bots (like ChatGPT-User and PerplexityBot) while blocking training-only crawlers (like GPTBot and CCBot) ensures your content is available for AI-generated citations without being used for unattributed model training. This nuanced approach to bot management is becoming a critical component of modern SEO and content protection strategy.

    XML Sitemap

    An Extensible Markup Language (XML) file that serves as a detailed roadmap of all important URLs on a website that you want search engines to crawl and index. It provides search engines with a clear, structured list of all valuable pages, including metadata such as when a page was last modified, how frequently it is updated, and its relative importance within the site. Webmasters typically submit their XML sitemap to tools like Google Search Console to facilitate faster discovery and indexing of new or updated content. Why it matters: A well-maintained XML sitemap is crucial for effective SEO and ensures that search engines can efficiently discover all relevant content, especially for large websites or those with complex structures. It helps search engines, and by extension, AI models that learn from indexed content, understand your site's full scope and ensure your brand's information is readily available for inclusion in search results and AI-generated answers.

    Canonical Tag

    An HTML element that tells search engines which version of a URL is the 'master' copy. Canonical tags prevent duplicate content issues when the same page is accessible via multiple URLs, consolidating link equity and ensuring the correct page gets indexed. Why it matters: In reputation management and SEO, duplicate content can dilute search visibility and confuse search engines, preventing the preferred version of a page from ranking. For example, if an e-commerce site has a product page accessible via example.com/product and example.com/category/product, without a canonical tag, search engines might see these as two separate pages with identical content, potentially splitting their ranking power. By implementing a canonical tag pointing to the preferred URL, all SEO credit is consolidated, ensuring the primary page ranks higher and avoiding a scenario where a less desired version appears in search results or is indexed by AI search models.

    E-E-A-T

    E-E-A-T stands for Experience, Expertise, Authoritativeness, and Trustworthiness — a fundamental framework Google uses to evaluate the quality and credibility of content, especially for YMYL (Your Money or Your Life) topics. Demonstrating strong E-E-A-T involves showcasing author credentials, citing credible sources, providing real-world examples, and building a reputable online presence. Why it matters: In the age of AI search, E-E-A-T is more critical than ever. Content exhibiting high E-E-A-T is not only more likely to rank well in traditional search but also to be selected, synthesized, and cited by AI Overviews and generative AI tools. For PR professionals, building E-E-A-T involves securing media mentions, expert quotes, and positive reviews that validate a brand's and its spokespeople's standing, directly impacting both human perception and how AI models understand and value your brand's information.

    Entity SEO

    Entity SEO is an advanced search engine optimization strategy that transcends traditional keyword-centric approaches by focusing on establishing your brand, people, products, or concepts as recognized "entities" within Google's Knowledge Graph and other semantic knowledge bases. This involves ensuring consistent Name, Address, Phone (NAP) data across online directories, implementing structured data markup (like Schema.org), building a presence on authoritative platforms like Wikipedia/Wikidata, and securing mentions from credible sources. Why it matters: By clearly defining your brand as an entity, you help search engines and AI models understand who you are, what you do, and how you relate to other entities. This enhances your E-E-A-T, improves the chances of appearing in Knowledge Panels and AI Overviews, and increases the likelihood that AI systems will accurately identify and trust your brand's information, making it a foundational element for success in the evolving landscape of AI search.

    Indexing

    Indexing is the crucial process by which search engines discover, crawl, and store web pages in their vast databases. When a search engine's spiders or crawlers visit a website, they read its content, analyze its structure, and follow links to other pages. This information is then organized and added to the search engine's index, making the page discoverable in search results. Why it matters: For any website or piece of content to appear in search engine results — and consequently be considered by AI search models — it must first be indexed. If a page isn't in the index, it cannot rank. SEO and PR efforts require ensuring that content is technologically accessible and structured in a way that facilitates efficient crawling and indexing. Monitoring indexing status through tools like Google Search Console is vital for maintaining online visibility and ensuring content reaches its intended audience.

    If You're Invisible in AI, You're Losing Clients Right Now.

    See exactly how your company appears across AI, search, and investor research — and uncover the hidden gaps costing you trust and deals.