Build a clean, search-engine-friendly robots.txt for your website in seconds. Control exactly which pages Googlebot and other crawlers can and can't index — no signup required.
robots.txt is a plain-text file that lives at the root of your website (e.g. https://yoursite.com/robots.txt) and tells search engine crawlers — Googlebot, Bingbot, and friends — which parts of your site they're allowed to fetch.
Crawl control. Each rule in the file targets a specific bot (or all bots via User-agent: *) and either allows or disallows a URL path. It's the very first file a crawler requests, so any rules you set apply before a single page is indexed.
Blocking pages. Most sites use robots.txt to keep crawlers out of admin dashboards, internal search results, staging URLs, shopping-cart endpoints, and other pages that shouldn't appear in Google. It saves crawl budget and prevents thin-content pages from cluttering the search index.
SEO importance. A well-configured robots.txt focuses crawlers on the pages that do matter, surfaces your sitemap.xml for faster discovery, and protects your rankings by preventing duplicate or low-value URLs from being indexed. A misconfigured one can accidentally hide your entire site — so test before you publish.
* and end anchors $ are supported.
Everything you need to know about meta tags, SEO, and getting the most out of this generator.
robots.txt is a plain-text file at the root of your site that tells crawlers which URLs they can request. Each rule starts with a User-agent: line (which bot it targets) followed by Allow: or Disallow: directives. Well-behaved crawlers fetch this file before anything else and respect what they find.
The file must live at the root of your domain, served from https://yoursite.com/robots.txt. Crawlers do not look in subfolders. For WordPress, drop it in the site root via FTP or your hosting file manager. For Next.js, put it in public/robots.txt. For Shopify or Squarespace, use their built-in robots.txt editor.
Disallow in robots.txt blocks crawling — Google won't fetch the page. But if the page is linked from elsewhere, it can still show up in search results (without a description). To fully remove a page from the index, use the <meta name="robots" content="noindex"> tag and let Google crawl the page so it can see that tag. Don't combine Disallow + noindex — Google won't be able to read the noindex directive.
Yes — enable the "Block AI training bots" toggle above and we'll add disallow rules for GPTBot, ClaudeBot, Google-Extended, CCBot, and PerplexityBot. This stops OpenAI, Anthropic, Google's AI training pipeline, Common Crawl, and Perplexity from using your site for model training. Note that less-scrupulous scrapers ignore robots.txt entirely — server-side blocking is needed for those.
Not strictly required, but highly recommended. A Sitemap: directive in robots.txt tells crawlers exactly where to find your sitemap.xml, which speeds up discovery of new and updated pages. Use the full absolute URL: Sitemap: https://yoursite.com/sitemap.xml. You can include multiple sitemap lines if you have several.
Use Google Search Console's robots.txt Tester — it lets you paste your file and check whether any specific URL is allowed or blocked for Googlebot, smartphones, image bots, and so on. Always test before publishing: a stray Disallow: / can de-index your entire site overnight. Bing Webmaster Tools also offers a tester for Bingbot.