01 The Rule
Every site must provide an XML sitemap listing all canonical, indexable URLs. Sites exceeding 50,000 URLs must use a sitemap index. Only include URLs that return 200 and are not blocked by robots.txt or noindex.
Last updated: 2026-01-15
Every site must provide an XML sitemap listing all canonical, indexable URLs. Sites exceeding 50,000 URLs must use a sitemap index. Only include URLs that return 200 and are not blocked by robots.txt or noindex.
Sitemaps are the most direct way to tell search engines about your URL space. For large sites, sitemaps are often the primary discovery mechanism for deep content that isn't well-linked internally. Accurate lastmod values drive crawl prioritization.
Including noindex or redirecting URLs in sitemap
Conflicting signals — sitemap says 'index this' while page says 'don't'
Stale lastmod dates (never updated)
Google ignores lastmod signals for your entire domain
Single monolithic sitemap for 1M+ URL site
File too large to process; crawlers may abandon download
Generate sitemaps programmatically from your canonical URL database. Validate that every sitemap URL returns 200 and matches canonical tags. Segment by content type and update lastmod only on actual content changes.