Robots.txt Test
The Robots.txt Checker fetches your site's /robots.txt file, validates its syntax, simulates how Googlebot, Bingbot and other major crawlers will interpret each rule, and flags directives that may be unintentionally blocking important pages from search. Robots.txt is the single file that controls what search engines may and may not crawl on your site — a one-character typo can de-index your entire homepage. This tool catches those mistakes in seconds and shows you exactly which URLs are at risk.
What This Tool Checks
- Robots.txt presence at /robots.txt and HTTP 200 response
- Valid syntax (User-agent, Disallow, Allow, Sitemap, Crawl-delay)
- Per-bot rule simulation (Googlebot, Googlebot-Image, Bingbot, etc.)
- Disallow rules that block CSS, JS or essential resources
- Sitemap: declaration with absolute URL
- Conflicting Allow / Disallow rules
- Catastrophic patterns ("Disallow: /") that block the entire site
Why It Matters for SEO
Robots.txt is the most catastrophic file on your website from an SEO perspective. A single line — "Disallow: /" — can drop your entire site out of Google overnight. Even smaller mistakes (blocking CSS or JavaScript Googlebot needs to render the page) cause Google to misinterpret your layout, fail mobile-friendly tests and demote your rankings. On the positive side, a well-tuned robots.txt focuses crawl budget on your most valuable URLs.
How to Fix It
Keep robots.txt minimal. Allow all crawlers by default, disallow only specific paths that genuinely should not be crawled (admin areas, internal search results), and add a Sitemap: line pointing to your sitemap. Never use robots.txt to hide URLs you want kept out of the index — use a meta robots noindex tag on the page itself.
How It Works
We request /robots.txt with each major crawler's actual user-agent string, parse the file using the same rule-precedence logic Google's open-source robots.txt parser uses, and report which paths each bot is allowed and disallowed to crawl. Sample URLs from your site are tested against the rules so you can see exactly what is blocked.
Common Mistakes to Avoid
- Leaving a development "Disallow: /" rule in production
- Blocking /wp-content/ or /assets/ so Google cannot load CSS/JS
- Forgetting the Sitemap: directive (slows discovery of new URLs)
- Trying to use robots.txt to hide pages from search (use noindex instead)
- Case-sensitivity errors (Disallow vs. disallow vs. DISALLOW)
Quick Checklist
- /robots.txt returns HTTP 200
- No accidental "Disallow: /" rule
- CSS, JS and image directories are crawlable
- Sitemap: line points to the absolute sitemap URL
- Rules render the same in this tool as in Google Search Console