01 The Rule
Use robots.txt to block crawlers from low-value URL spaces: faceted navigation parameters, internal search results, admin areas, and session-based URLs. Never block CSS or JavaScript resources required for rendering.
Last updated: 2026-01-10
Use robots.txt to block crawlers from low-value URL spaces: faceted navigation parameters, internal search results, admin areas, and session-based URLs. Never block CSS or JavaScript resources required for rendering.
Robots.txt is the first line of crawl budget defense. Every URL path that's crawlable but shouldn't be indexed wastes crawl capacity. For sites with millions of faceted URLs, proper robots.txt rules can reduce crawl waste by 90%+.
Blocking CSS/JS resources
Googlebot can't render pages — mobile-first indexing fails, content invisible
No robots.txt file at all
All URL space is crawlable including parameters, internal search, and admin
Overly broad Disallow rules
Important content blocked from crawling and indexing
Audit your crawlable URL space using log file analysis. Identify URL patterns consuming crawl budget without generating organic traffic. Add targeted Disallow rules for those patterns while ensuring important content remains accessible.