{"id":147863,"date":"2024-01-17T14:42:05","date_gmt":"2024-01-17T14:42:05","guid":{"rendered":"https:\/\/www.similarweb.com\/blog\/?p=147863"},"modified":"2024-12-19T15:39:01","modified_gmt":"2024-12-19T15:39:01","slug":"robots-txt","status":"publish","type":"post","link":"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/","title":{"rendered":"Robots.txt for SEO: The Ultimate Guide"},"content":{"rendered":"<p>There are times when you must stop Google from crawling your site.<\/p>\n<p>And the way you do this is by creating a tiny little file called a robots.txt file.<\/p>\n<p>But don&#8217;t let its size fool you; when used correctly, it could boost your SEO.<\/p>\n<p>If you use it wrong, your content might never see the light of day.<\/p>\n<p>In this post, we&#8217;ll get into:<\/p>\n<ul>\n<li aria-level=\"1\">What robots.txt means<\/li>\n<li aria-level=\"1\">When you should use a robots.txt file<\/li>\n<li aria-level=\"1\">How to create a robots.txt file<\/li>\n<li aria-level=\"1\">Some examples of robots.txt files<\/li>\n<\/ul>\n<h2>What is a robots.txt file?<\/h2>\n<p>A robots.txt file, typically situated in a website&#8217;s root directory, instructs web crawlers which pages should be excluded from crawling or indexing. This file is crucial for managing search engine access, preventing content from being indexed to maintain privacy, controlling bandwidth usage, or focusing search engine attention on important areas of the site.<\/p>\n<p>The robots.txt file is part of a group of web standards called the Robots Exclusion Protocol (REP) that regulate how web bots crawl the web to index content.<\/p>\n<h3>An example of a Robots.txt file<\/h3>\n<p><i>User-agent: *<\/i><\/p>\n<p><i>Disallow: \/private\/<\/i><\/p>\n<p><i>Disallow: \/restricted-page.html<\/i><\/p>\n<p><i>Disallow: \/images\/<\/i><\/p>\n<p><i>Allow: \/images\/public\/<\/i><\/p>\n<p>In this example:<\/p>\n<ul>\n<li aria-level=\"1\"><b>User-agent:<\/b> * is a wildcard that applies the rules to all web crawlers or robots.<\/li>\n<li aria-level=\"1\"><b>Disallow:<\/b> specifies the directories or files that should not be crawled. For example, the <b>\/private\/<\/b> directory and the<b> \/restricted-page.html<\/b> file are off-limits.<\/li>\n<li aria-level=\"1\"><b>Allow<\/b> is used to override a Disallow rule. In this case, while the entire \/images\/ directory is disallowed, the \/images\/public\/ subdirectory is allowed.<\/li>\n<\/ul>\n<h2>How to find a robots.txt file<\/h2>\n<p>Finding a site&#8217;s robots.txt file is simple. You can usually see it by typing the URL for the homepage of a site and adding &#8220;\/robots.txt.&#8221;<\/p>\n<p>For example:<\/p>\n<p><i>https:\/\/example.com\/robots.txt<\/i><\/p>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-185417\" src=\"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Robots-txt-file-for-Cheq.png\" alt=\"Robots.txt file for cheq.ai\" width=\"844\" height=\"481\" srcset=\"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Robots-txt-file-for-Cheq.png 844w, https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Robots-txt-file-for-Cheq-300x171.png 300w, https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Robots-txt-file-for-Cheq-768x438.png 768w, https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Robots-txt-file-for-Cheq-512x292.png 512w\" sizes=\"(max-width: 844px) 100vw, 844px\" \/><\/p>\n<h2>Why do you need a robots.txt file?<\/h2>\n<p>In general, you should review your robots.txt files as part of a <a href=\"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/site-audit-guide\/\">comprehensive site audit<\/a>. But your site might not need a robots.txt file. Without one, the Google bot will crawl through your entire site. This is exactly what you want it to do if you want your entire site to be indexed. You only need one if you want more control over what search engines crawl.<\/p>\n<p>Here are the main scenarios in which you will need a robots.txt file:<\/p>\n<h3>1. Crawl budget optimization<\/h3>\n<p>Each website has a <a href=\"https:\/\/www.similarweb.com\/blog\/marketing\/search-podcast\/crawl-budget-optimization\/\">crawl budget<\/a>. This means in a given time frame, Google will crawl a limited amount of pages on a site.<\/p>\n<p>If the amount of pages on your site exceeds the crawl budget, you&#8217;ll have pages that don&#8217;t make it into Google&#8217;s index. And when your pages are not in Google&#8217;s index, there is very little chance of them ranking in search.<\/p>\n<p>One easy way to optimize this is to make sure that search engine bots don&#8217;t crawl low-priority or non-essential content that doesn&#8217;t need frequent crawling. This could include duplicate pages, archives, or dynamically generated content that doesn&#8217;t significantly impact search rankings. This will save your crawl budget for the pages you do want indexed.<\/p>\n<p>You can easily monitor nonessential sections of your site by setting up site segment analysis using Similarweb&#8217;s <b>Website Segments <\/b>tool. This will show you if those pages are getting indexed. Simply set up a segment that covers all your content. You can choose any rule, including:<\/p>\n<ul>\n<li aria-level=\"1\">Folders<\/li>\n<li aria-level=\"1\">Any variation of text<\/li>\n<li aria-level=\"1\">Exact text<\/li>\n<li aria-level=\"1\">Exact URLs<\/li>\n<\/ul>\n<p>Below, we are setting up a segment for the \/gp\/ subfolder on amazon.com.<\/p>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-185418\" src=\"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Creating-a-new-website-segment.png\" alt=\"Creating a new website segment\" width=\"609\" height=\"546\" srcset=\"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Creating-a-new-website-segment.png 609w, https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Creating-a-new-website-segment-300x269.png 300w, https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Creating-a-new-website-segment-512x459.png 512w\" sizes=\"(max-width: 609px) 100vw, 609px\" \/><\/p>\n<p>Once your segment is set up, go to the Marketing Channels report and look at Organic Traffic. This will quickly show you if this site segment is getting traffic and eating up your crawl budget. Below, you can see that the segment we are tracking is getting 491.6K visits over the period of one year.<\/p>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-185419\" src=\"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Marketing-Channels-report-showing-Organic-Traffic.png\" alt=\"Marketing Channels report showing Organic Traffic\" width=\"880\" height=\"719\" srcset=\"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Marketing-Channels-report-showing-Organic-Traffic.png 880w, https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Marketing-Channels-report-showing-Organic-Traffic-300x245.png 300w, https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Marketing-Channels-report-showing-Organic-Traffic-768x627.png 768w, https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Marketing-Channels-report-showing-Organic-Traffic-512x418.png 512w\" sizes=\"(max-width: 880px) 100vw, 880px\" \/><\/p>\n<h3>2. Avoiding duplicate content issues<\/h3>\n<p>For many sites, duplicate content is unavoidable. For instance, if you are running an ecommerce site and you have multiple product pages that could potentially rank on a single keyword. Robots.txt is an easy way to avoid this.<\/p>\n<h3>3. Prioritizing important content<\/h3>\n<p>By using the <i>Allow: <\/i>directive, you can explicitly permit search engines to crawl and index specific high-priority content on your site. This helps ensure that important pages are discovered and indexed.<\/p>\n<h3>4. Preventing indexing of admin or test areas<\/h3>\n<p>If your site has admin or test areas that should not be indexed, using <i>Disallow:<\/i> in the robots.txt file can help prevent search engines from including these areas in search results.<\/p>\n    <div class=\"post-banner post-banner--base\">\n        <div class=\"post-banner__wrapper\">\n            <div class=\"post-banner__text\">\n                                    <p class=\"post-banner__title\">Track every aspect of your SEO<\/p>\n                                    <p class=\"post-banner__subtitle\">Get granular metrics to into your keyword rankings, organic pages, and SERP features.<\/p>\n                                <div class=\"post-banner__button-wrapper\">\n                                            <a class=\"swui-button swui-button--solid swui-button--primary post-banner__button js-post-banner\"\n                           href=\"https:\/\/account.similarweb.com\"\n                           data-disable-dynamic-tracking\n                        >Go to Similarweb<\/a>\n                                    <\/div>\n            <\/div>\n                    <\/div>\n    <\/div>\n\n<h2>How does robots.txt work?<\/h2>\n<p>Robots.txt files inform search engine bots what pages to ignore and which pages to prioritize. To understand this, let&#8217;s first explore what bots do.<\/p>\n<h3>How search engine bots discover and index content<\/h3>\n<p>The job of a search engine is to make web content available to end users through search. To do this, search engine bots or spiders have to discover content by systematically visiting and analyzing web pages. This process is called crawling.<\/p>\n<p>To discover information, search engine bots start by visiting a list of known web pages. They then follow links from one page to another across the net.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-185420 size-full\" src=\"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Search-engine-bot.png\" alt=\"Search engine bot\" width=\"1200\" height=\"840\" srcset=\"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Search-engine-bot.png 1200w, https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Search-engine-bot-300x210.png 300w, https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Search-engine-bot-1024x717.png 1024w, https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Search-engine-bot-768x538.png 768w, https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Search-engine-bot-512x358.png 512w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" \/><\/p>\n<p>Once a page is crawled, the information is parsed, and relevant data is stored in the search engine&#8217;s index. The index is a massive database that allows the search engine to quickly retrieve and display relevant results when a user performs a search query.<\/p>\n<h3>How do robots.txt files impact crawling and indexing?<\/h3>\n<p>When a bot lands on a site, it checks for a robots.txt file to determine how it should crawl and index the site. If the file is present, it provides instructions for crawling. If there&#8217;s no robots.txt file or it lacks crawling instructions, the bot will proceed to crawl the site.<\/p>\n<p>The robots.txt file starts by specifying the user agent. A user agent refers to software that accesses web content, in our case, a search engine bot. It also includes directives such as<\/p>\n<ul>\n<li aria-level=\"1\">Allow:<\/li>\n<li aria-level=\"1\">Disallow:<\/li>\n<\/ul>\n<p>For example:<\/p>\n<p><i>User-agent: *<\/i><\/p>\n<p><b><i>Disallow:<\/i><\/b><i> \/private\/<\/i><\/p>\n<p><b><i>Allow:<\/i><\/b><i> \/public\/<\/i><\/p>\n<p><b><i>Disallow:<\/i><\/b><i> \/restricted\/<\/i><\/p>\n<p>In this example:<\/p>\n<ul>\n<li aria-level=\"1\"><b>User-agent: <\/b>* applies the rules to all web crawlers.<\/li>\n<li aria-level=\"1\"><b>Disallow:<\/b> \/private\/ instructs all web crawlers to avoid crawling the \/private\/ directory.<\/li>\n<li aria-level=\"1\"><b>Allow:<\/b> \/public\/ allows all web crawlers to crawl the \/public\/ directory, even though there is a broader <i>Disallow<\/i> directive.<\/li>\n<li aria-level=\"1\"><b>Disallow:<\/b> \/restricted\/ further disallows crawling of the \/restricted\/ directory.<\/li>\n<\/ul>\n<p>It&#8217;s important to note that robots.txt files are directives that search engine bots will generally follow. But if there are links pointing to a page that is disallowed, Google will still crawl that page and is likely to index it.<\/p>\n<p>To avoid this, you should use noindex in the &lt;head&gt; section of the page&#8217;s HTML.<\/p>\n<p><i>&lt;meta name=&#8221;robots&#8221; content=&#8221;noindex&#8221;&gt;<\/i><\/p>\n<h2>Implementing crawl directives: Understanding robots.txt syntax<\/h2>\n<p>A robots.txt file informs a search engine how to crawl by use of directives. A directive is a command that provides a system (in this case, a search engine bot) information on how to behave.<\/p>\n<p>Each directive begins by first specifying the user-agent and then setting the rules for that user-agent. The user agent refers to the application that acts on behalf of a user when interacting with a system or network. In our case, the user agent refers to the web browser.<\/p>\n<p>For example:<\/p>\n<ul>\n<li aria-level=\"1\">User-agent: Googlebot<\/li>\n<li aria-level=\"1\">User-agent: Bingbot<\/li>\n<\/ul>\n<p>Below, we have compiled two lists; one contains supported directives and the other unsupported directives.<\/p>\n<h3>Supported Directives<\/h3>\n<p><b>Disallow:<\/b> This directive prevents search engines from crawling certain areas of a website. You can:<\/p>\n<ol>\n<li aria-level=\"1\">Block access to all directories for all user agents.<br \/>\n<i>user-agent: *<\/i> (The \u2018*\u2019 is a wild card. See below.)<br \/>\n<i>Disallow: \/<\/i><\/li>\n<li aria-level=\"1\">Block a particular directory for all user agents.<br \/>\n<i>user-agent: *<\/i><i><br \/>\n<\/i><i>Disallow: \/portfolio<\/i><\/li>\n<li aria-level=\"1\">Block access to a PDF or any other files for all user agents using the appropriate file extension.<br \/>\n<i>user-agent: *<\/i><i><br \/>\n<\/i><i>Disallow: *.pdf<\/i><\/li>\n<\/ol>\n<p><b>Allow:<\/b> This directive allows search engines to crawl a page or directory. Use this directive to override a disallowed directive. Below we are blocking search engines from crawling the \/portfolio folder but allowing them access to the \/allowed-portfolio subfolder in the \/portfolio folder.<\/p>\n<p>user-agent: *<br \/>\n<i>Disallow: \/portfolio<\/i><i><br \/>\n<\/i><i>Allow: \/portfolio\/allowed-portfolio<\/i><\/p>\n<p><b>Sitemap: <\/b>You can specify the location of your <a href=\"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/sitemaps\/\">sitemap<\/a> in your robots.txt file. A sitemap is a file on your site that provides a structured list of URLs to assist search engines in how to crawl your site.<\/p>\n<p><i>Sitemap: https:\/\/www.example.com\/sitemap.xml<\/i><\/p>\n<p>If you want to understand more about directives, check out <a href=\"https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/robots\/create-robots-txt?hl=en&amp;ref_topic=6061961&amp;visit_id=638404040977831162-3966239109&amp;rd=1\">Google&#8217;s<\/a> robots.txt guide.<\/p>\n<h3>Unsupported Directives<\/h3>\n<p>In 2019, Google posted that <b>crawl-delay<\/b>, <b>nofollow<\/b>, and <b>noindex<\/b> are not supported in robots.txt files. If you include them in your robots.txt files, they simply will not work. In reality, these rules were never supported by Google and were not intended to appear in robots.txt files but can be included in the robots&#8217; meta tags on separate pages on your site.<\/p>\n<p>There are other options if you want to exclude pages from Google&#8217;s index, including:<\/p>\n<ul>\n<li aria-level=\"1\"><b>Using the meta tag with noindex:<\/b><\/li>\n<\/ul>\n<p>Add the following HTML meta tag to the &lt;head&gt; section of the page&#8217;s HTML:<\/p>\n<p><i>&lt;meta name=&#8221;robots&#8221; content=&#8221;noindex&#8221;&gt;<\/i><b><\/b><\/p>\n<ul>\n<li aria-level=\"1\"><b>Using X-Robots-Tag HTTP header:<\/b><\/li>\n<\/ul>\n<p>If you have access to server configuration, you can use the X-Robots-Tag HTTP header to achieve a similar result.<\/p>\n<p>For example:<\/p>\n<p><i>X-Robots-Tag: noindex<\/i><b><\/b><\/p>\n<ul>\n<li aria-level=\"1\"><b>Using Google Search Console:<\/b><\/li>\n<\/ul>\n<p>You can use the URL Removal Tool to request the removal of a specific URL from Google\u2019s index.<\/p>\n<p>Since crawl-delay is not supported by Google, if you want to ask Google to crawl slower, you can set the crawl rate in Google Search Console.<\/p>\n<h3>Using wildcards<\/h3>\n<p>Wildcards are characters you can use to provide directives that apply to multiple URLs at once. The two main wildcards used in robots.txt files are the asterisk (*) and the dollar sign ($).<\/p>\n<p>You can use them to apply directives or to user agents.<\/p>\n<p>For example:<\/p>\n<ol>\n<li aria-level=\"1\"><b>Asterix (*):<\/b> When applied to user agents, the wild card means \u201capply to all user agents.&#8221; When applied to URLs, it means &#8220;apply to all URLs.&#8221; If you have URLs that follow the same pattern, this will save you time.<\/li>\n<li aria-level=\"1\"><b>Dollar sign ($):<\/b> The dollar sign is used at the end of a URL pattern to match URLs that end with a specific string.<\/li>\n<\/ol>\n<p><i>User-agent: *<\/i><\/p>\n<p><i>Disallow: \/*.pdf$<\/i><\/p>\n<p>In the example below, we are blocking search engines from crawling all PDF files.<\/p>\n<p><i>user-agent: *\u00a0<\/i><\/p>\n<p><i>Disallow: \/*.pdf$<\/i><\/p>\n<p>URLs that end with .pdf will not be accessible. But take note that if your URL has additional text after the .pdf ending, then that URL will be accessible.<\/p>\n<div class=\"stepped-form stepped-form--background\">\n  <div class=\"stepped-form__content\">\n      \n<div data-contact-us-hidden-sub-subject=\" \"\ndata-contact-us-should-get-chilipiper=\"false\"\ndata-contact-us-boolean-always-show-phone=\"false\"\ndata-contact-us-i18n-subtitle3_tell_us_more=\" \"\ndata-contact-us=\"split\"\ndata-contact-us-hidden-subject=\"no cu\"\ndata-contact-us-hidden-asset-type=\"Report\"\ndata-contact-us-hidden-campaign-name=\"Download the indestructible SEO strategy guide\"\ndata-contact-us-hidden-campaign-solution=\"All\"\ndata-contact-us-hidden-campaign-region=\"ALL\"\ndata-contact-us-hidden-form-location-breadcrumbs=\"Form Corp\/Reports\/Robots.txt for SEO: The Ultimate Guide\/Content\"\ndata-contact-us-hidden-message=\"I want to download the file\"\ndata-contact-us-hidden-sfcmpid=\"701QB00000Gcuw1YAB\"\ndata-contact-us-hidden-member-status=\"Downloaded\"\ndata-contact-us-hidden-button-caption=\"Download\"\ndata-contact-us-hidden-cta-description=\"null\"\ndata-contact-us-hidden-cta-url=\"https:\/\/www.similarweb.com\/corp\/wp-content\/uploads\/2024\/07\/SEO-GUIDE-2024_evergreen_color.pdf\"\ndata-contact-us-i18n-title=\"SEO guide: The 4 key foundations to a winning SEO strategy\"\ndata-contact-us-i18n-submit_button=\"Download\"\ndata-contact-us-i18n-success_title=\"Thanks for downloading!\"\ndata-contact-us-i18n-success_subtitle=\"Sit back, relax, and enjoy!\"\ndata-contact-us-i18n-success_sub_subtitle=\" \"\ndata-corp-form-type=\"split-download\"\ndata-contact-us-file-url=\"https:\/\/www.similarweb.com\/corp\/wp-content\/uploads\/2024\/07\/SEO-GUIDE-2024_evergreen_color.pdf\"\ndata-contact-us-hidden-file-url=\"https:\/\/www.similarweb.com\/corp\/wp-content\/uploads\/2024\/07\/SEO-GUIDE-2024_evergreen_color.pdf\"\n>\n    <\/div>\n  <\/div>\n<\/div>\n\n\n<h2>How to create robots.txt files<\/h2>\n<p>If your website doesn&#8217;t have a robots.txt file, you can easily create one in a text editor. Simply open a blank .txt document and insert your directives. When you are finished, just save the file as &#8216;robots.txt,&#8217; and there you have it.<\/p>\n<p>Now, you might be wondering where to put your robots.txt file.<\/p>\n<p>In theory, you can put it in any main directory on your site, but to ensure that bots find it, we recommend uploading it to your root directory.<\/p>\n<p>Next, upload it to the root directory of your website. Make sure it is accessible via a web browser at the path https:\/\/www.yourdomain.com\/robots.txt. If you want to test how effective your robots.txt file is you can test any URL with the <a href=\"https:\/\/support.google.com\/webmasters\/answer\/9012289?sjid=11699420002697656696-EU\">Google Search Console URL Inspection tool<\/a>.<\/p>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-185421\" src=\"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Google-Search-Console-URL-inspection-tool.png\" alt=\"Google Search Console URL inspection tool\" width=\"875\" height=\"376\" srcset=\"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Google-Search-Console-URL-inspection-tool.png 875w, https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Google-Search-Console-URL-inspection-tool-300x129.png 300w, https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Google-Search-Console-URL-inspection-tool-768x330.png 768w, https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2024\/01\/Google-Search-Console-URL-inspection-tool-512x220.png 512w\" sizes=\"(max-width: 875px) 100vw, 875px\" \/><\/p>\n<h2>How to add robots.txt to WordPress<\/h2>\n<p>If you use WordPress, the easiest way to create a robots.txt file in WordPress is to use plugins like <b>Yoast<\/b> and <b>All in One SEO Pack<\/b>.<\/p>\n<p>If you use <b>Yoast<\/b>, go to SEO &gt; Tools &gt; File Editor. Click on the robots.txt tab, and you can create or edit your robots.txt file there.<\/p>\n<p>If you use <b>All in One SEO Pack<\/b>, go to One SEO &gt; Feature Manager. Activate the &#8220;Robots.txt&#8221; feature, and you can configure your directives from there.<\/p>\n<h2>Common mistakes you want to avoid<\/h2>\n<p>Although there are many benefits to using robot.txt, getting them wrong can kill your traffic. Let&#8217;s get into some mistakes to avoid.<b><\/b><\/p>\n<ul>\n<li aria-level=\"1\"><b>Blocking important content: By <\/b>using overly restrictive rules, you might accidentally restrict important sections of your site<\/li>\n<\/ul>\n<ul>\n<li aria-level=\"1\"><b>Blocking CSS, JavaScript, and Image files: <\/b>Some search engines use these resources to understand the structure of your site<\/li>\n<\/ul>\n<ul>\n<li aria-level=\"1\"><b>Incorrect case sensitivity: <\/b>Robots.txt files are case sensitive<\/li>\n<\/ul>\n<ul>\n<li aria-level=\"1\"><b>Assuming security through robots.txt: <\/b>Sensitive content should be protected by other means as robots.txt is a guideline but does not ensure that pages will not be indexed<\/li>\n<li aria-level=\"1\"><b>Incorrect syntax: <\/b>Validate your files as typos can lead to search engines misinterpreting your robots.txt files<\/li>\n<\/ul>\n<h2>Robots.txt files: The final word<\/h2>\n<p>You now have a comprehensive understanding of robots.txt files. You know what they are, how they work, and how they can be used to enhance your SEO. Just remember always to review and test your robots.txt\u00a0 files. Done right, they will serve you well; done wrong they might just mean the end of your organic traffic.<\/p>\n<p>Use them wisely.<\/p>\n    <div class=\"post-banner post-banner--base\">\n        <div class=\"post-banner__wrapper\">\n            <div class=\"post-banner__text\">\n                                    <p class=\"post-banner__title\">Download your copy of the indestructible SEO strategy guide<\/p>\n                                    <p class=\"post-banner__subtitle\">All the elements you need to build a successful SEO strategy<\/p>\n                                <div class=\"post-banner__button-wrapper\">\n                    <div\n    data-contact-us=\"split\"\ndata-corp-form-type=\"split-download\"\ndata-contact-us-should-get-chilipiper=\"false\"\ndata-contact-us-i18n-subtitle3_tell_us_more=\" \"\ndata-contact-us-boolean-always-show-phone=\"false\"\ndata-contact-us-i18n-title=\"Download the indestructible SEO strategy guide\"\ndata-contact-us-i18n-subtitle1=\"All the elements you need to build a successful SEO strategy\"\ndata-contact-us-i18n-submit_button=\"Download\"\ndata-contact-us-i18n-success_title=\"Thanks for downloading the guide\"\ndata-contact-us-i18n-success_subtitle=\"Sit back, relax, and enjoy!\"\ndata-contact-us-i18n-success_sub_subtitle=\"Ready to dive deeper?\"\ndata-contact-us-file-url=\"https:\/\/www.similarweb.com\/corp\/wp-content\/uploads\/2024\/07\/SEO-GUIDE-2024_evergreen_color.pdf\"\ndata-contact-us-hidden-file-url=\"https:\/\/www.similarweb.com\/corp\/wp-content\/uploads\/2024\/07\/SEO-GUIDE-2024_evergreen_color.pdf\"\ndata-contact-us-hidden-asset-type=\"Report\"\ndata-contact-us-hidden-campaign-name=\"Site Audit Checklist\"\ndata-contact-us-hidden-campaign-region=\"ALL\"\ndata-contact-us-hidden-campaign-solution=\"All\"\ndata-contact-us-hidden-sfcmpid=\"70167000000c3FQAAY\"\ndata-contact-us-hidden-member-status=\"Downloaded\"\ndata-contact-us-hidden-sub-subject=\" \"\ndata-contact-us-hidden-subject=\"no cu\"\ndata-contact-us-hidden-message=\"I want to download the file\"\ndata-contact-us-hidden-cta-description=\"All the elements you need to build a successful SEO strategy\"\ndata-contact-us-hidden-cta-url=\"https:\/\/www.similarweb.com\/corp\/wp-content\/uploads\/2024\/07\/SEO-GUIDE-2024_evergreen_color.pdf\"\ndata-contact-us-hidden-button-caption=\"Get the guide now\"\n>\n            <button\n                class=\"swui-button swui-button--solid swui-button--primary post-banner__button js-post-banner\" data-analytics-category=\"Form\"\ndata-analytics-label=\"Robots.txt for SEO: The Ultimate Guide\"\n>\n            Get the guide now        <\/button>\n    <\/div>\n                <\/div>\n            <\/div>\n                    <\/div>\n    <\/div>\n\n<h2>FAQs<\/h2>\n<p><b>What is the robots.txt file?<\/b><\/p>\n<p>Robots.txt is a text file located in the root directory of a site and is used to inform web crawlers how to crawl and index the site.<\/p>\n<p><b>How do I access a robots.txt file?<\/b><\/p>\n<p>The easiest way to access a robots.txt file is to type the site\u2019s URL into your browser and then add \/robots.txt to the end. It should look like this: https:\/\/www.example.com\/robots.txt.<\/p>\n<p><b>Is robots.txt good for SEO?<\/b><\/p>\n<p>The robots.txt file plays an important role in SEO. Although they don&#8217;t directly impact a website&#8217;s rankings, they help search engines understand the site&#8217;s structure and which pages to include or exclude from their index.<\/p>\n<p>The robots.txt file can contribute to SEO by:<\/p>\n<ul>\n<li aria-level=\"1\">Controlling Crawling<\/li>\n<li aria-level=\"1\">Preserving Crawl Budgets<\/li>\n<li aria-level=\"1\">Managing Sitemaps<\/li>\n<li aria-level=\"1\">Preventing Indexation of Duplicate Content<\/li>\n<\/ul>\n<p>It&#8217;s important to note that robots.txt files should be used carefully. Incorrectly configuring the file can inadvertently block search engines from accessing important content, leading to a negative impact on your site&#8217;s visibility.<\/p>\n<p><b>When should you use a robots.txt file?<\/b><\/p>\n<p>Use a robots.txt file to control search engine crawling. Restrict sensitive areas, prevent indexing of duplicate content, manage crawl budget, and guide bots away from non-essential or private content.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There are times when you must stop Google from crawling your site. And the way you do this is by creating a tiny little file called a robots.txt file. But don&#8217;t let its size fool you; when used correctly, it could boost your SEO. If you use it wrong, your content might never see the [&hellip;]<\/p>\n","protected":false},"author":499,"featured_media":185522,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"categories":[2803,6345],"tags":[],"class_list":["post-147863","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-marketing","category-seo"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Robots.txt for SEO: The Ultimate Guide | Similarweb<\/title>\n<meta name=\"description\" content=\"There are times you should influence how Google crawls your site. In this post, we reveal what a robots.txt file is and how to use it to boost your SEO.\" \/>\n<meta name=\"robots\" content=\"max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Darrell Mordecai\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"15 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/\"},\"author\":{\"name\":\"Darrell Mordecai\",\"@id\":\"https:\/\/www.similarweb.com\/blog\/#\/schema\/person\/645f1730e736ea84615ff69fc556fbbc\"},\"headline\":\"Robots.txt for SEO: The Ultimate Guide\",\"datePublished\":\"2024-01-17T14:42:05+00:00\",\"dateModified\":\"2024-12-19T15:39:01+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/\"},\"wordCount\":2699,\"publisher\":{\"@id\":\"https:\/\/www.similarweb.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2022\/03\/Robots.txt-for-SEO_-A-Complete-Guide.png\",\"articleSection\":[\"Marketing\",\"SEO\"],\"inLanguage\":\"\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/\",\"url\":\"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/\",\"name\":\"Robots.txt for SEO: The Ultimate Guide | Similarweb\",\"isPartOf\":{\"@id\":\"https:\/\/www.similarweb.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2022\/03\/Robots.txt-for-SEO_-A-Complete-Guide.png\",\"datePublished\":\"2024-01-17T14:42:05+00:00\",\"dateModified\":\"2024-12-19T15:39:01+00:00\",\"description\":\"There are times you should influence how Google crawls your site. In this post, we reveal what a robots.txt file is and how to use it to boost your SEO.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/#breadcrumb\"},\"inLanguage\":\"\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"\",\"@id\":\"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/#primaryimage\",\"url\":\"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2022\/03\/Robots.txt-for-SEO_-A-Complete-Guide.png\",\"contentUrl\":\"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2022\/03\/Robots.txt-for-SEO_-A-Complete-Guide.png\",\"width\":2124,\"height\":1260},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.similarweb.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Robots.txt for SEO: The Ultimate Guide\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.similarweb.com\/blog\/#website\",\"url\":\"https:\/\/www.similarweb.com\/blog\/\",\"name\":\"Similarweb\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/www.similarweb.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.similarweb.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.similarweb.com\/blog\/#organization\",\"name\":\"Similarweb\",\"url\":\"https:\/\/www.similarweb.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"\",\"@id\":\"https:\/\/www.similarweb.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2021\/03\/1587374135933.png\",\"contentUrl\":\"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2021\/03\/1587374135933.png\",\"width\":200,\"height\":200,\"caption\":\"Similarweb\"},\"image\":{\"@id\":\"https:\/\/www.similarweb.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/Similarweb\",\"https:\/\/x.com\/Similarweb\",\"https:\/\/www.youtube.com\/channel\/UCVCI01HR6iB4AA4ChW08cvQ\",\"https:\/\/www.instagram.com\/similarwebinsights\/\",\"https:\/\/www.linkedin.com\/company\/similarweb\",\"https:\/\/en.wikipedia.org\/wiki\/Similarweb\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.similarweb.com\/blog\/#\/schema\/person\/645f1730e736ea84615ff69fc556fbbc\",\"name\":\"Darrell Mordecai\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"\",\"@id\":\"https:\/\/www.similarweb.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2023\/06\/darrelm.jpg\",\"contentUrl\":\"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2023\/06\/darrelm.jpg\",\"caption\":\"Darrell Mordecai\"},\"description\":\"Darrell Mordecai is a content marketing manager at Similarweb. After working as an SEO manager and geeking out on Google patents, he acquired a deep understanding of SEO, which he regularly pulls from to create content and copy for the SEO industry.\",\"sameAs\":[\"https:\/\/www.similarweb.com\/\",\"https:\/\/www.linkedin.com\/in\/darrell-mordecai-6a401316\/\",\"https:\/\/x.com\/https:\/\/x.com\/MordecaiDarrell\"],\"url\":\"https:\/\/www.similarweb.com\/blog\/author\/darrell-mordecai\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Robots.txt for SEO: The Ultimate Guide | Similarweb","description":"There are times you should influence how Google crawls your site. In this post, we reveal what a robots.txt file is and how to use it to boost your SEO.","robots":{"max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/","twitter_misc":{"Written by":"Darrell Mordecai","Est. reading time":"15 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/#article","isPartOf":{"@id":"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/"},"author":{"name":"Darrell Mordecai","@id":"https:\/\/www.similarweb.com\/blog\/#\/schema\/person\/645f1730e736ea84615ff69fc556fbbc"},"headline":"Robots.txt for SEO: The Ultimate Guide","datePublished":"2024-01-17T14:42:05+00:00","dateModified":"2024-12-19T15:39:01+00:00","mainEntityOfPage":{"@id":"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/"},"wordCount":2699,"publisher":{"@id":"https:\/\/www.similarweb.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/#primaryimage"},"thumbnailUrl":"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2022\/03\/Robots.txt-for-SEO_-A-Complete-Guide.png","articleSection":["Marketing","SEO"],"inLanguage":""},{"@type":"WebPage","@id":"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/","url":"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/","name":"Robots.txt for SEO: The Ultimate Guide | Similarweb","isPartOf":{"@id":"https:\/\/www.similarweb.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/#primaryimage"},"image":{"@id":"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/#primaryimage"},"thumbnailUrl":"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2022\/03\/Robots.txt-for-SEO_-A-Complete-Guide.png","datePublished":"2024-01-17T14:42:05+00:00","dateModified":"2024-12-19T15:39:01+00:00","description":"There are times you should influence how Google crawls your site. In this post, we reveal what a robots.txt file is and how to use it to boost your SEO.","breadcrumb":{"@id":"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/#breadcrumb"},"inLanguage":"","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/"]}]},{"@type":"ImageObject","inLanguage":"","@id":"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/#primaryimage","url":"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2022\/03\/Robots.txt-for-SEO_-A-Complete-Guide.png","contentUrl":"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2022\/03\/Robots.txt-for-SEO_-A-Complete-Guide.png","width":2124,"height":1260},{"@type":"BreadcrumbList","@id":"https:\/\/www.similarweb.com\/blog\/marketing\/seo\/robots-txt\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.similarweb.com\/"},{"@type":"ListItem","position":2,"name":"Robots.txt for SEO: The Ultimate Guide"}]},{"@type":"WebSite","@id":"https:\/\/www.similarweb.com\/blog\/#website","url":"https:\/\/www.similarweb.com\/blog\/","name":"Similarweb","description":"","publisher":{"@id":"https:\/\/www.similarweb.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.similarweb.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":""},{"@type":"Organization","@id":"https:\/\/www.similarweb.com\/blog\/#organization","name":"Similarweb","url":"https:\/\/www.similarweb.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"","@id":"https:\/\/www.similarweb.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2021\/03\/1587374135933.png","contentUrl":"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2021\/03\/1587374135933.png","width":200,"height":200,"caption":"Similarweb"},"image":{"@id":"https:\/\/www.similarweb.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/Similarweb","https:\/\/x.com\/Similarweb","https:\/\/www.youtube.com\/channel\/UCVCI01HR6iB4AA4ChW08cvQ","https:\/\/www.instagram.com\/similarwebinsights\/","https:\/\/www.linkedin.com\/company\/similarweb","https:\/\/en.wikipedia.org\/wiki\/Similarweb"]},{"@type":"Person","@id":"https:\/\/www.similarweb.com\/blog\/#\/schema\/person\/645f1730e736ea84615ff69fc556fbbc","name":"Darrell Mordecai","image":{"@type":"ImageObject","inLanguage":"","@id":"https:\/\/www.similarweb.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2023\/06\/darrelm.jpg","contentUrl":"https:\/\/www.similarweb.com\/blog\/wp-content\/uploads\/2023\/06\/darrelm.jpg","caption":"Darrell Mordecai"},"description":"Darrell Mordecai is a content marketing manager at Similarweb. After working as an SEO manager and geeking out on Google patents, he acquired a deep understanding of SEO, which he regularly pulls from to create content and copy for the SEO industry.","sameAs":["https:\/\/www.similarweb.com\/","https:\/\/www.linkedin.com\/in\/darrell-mordecai-6a401316\/","https:\/\/x.com\/https:\/\/x.com\/MordecaiDarrell"],"url":"https:\/\/www.similarweb.com\/blog\/author\/darrell-mordecai\/"}]}},"lang":"en","translations":{"en":147863},"pll_sync_post":[],"_links":{"self":[{"href":"https:\/\/www.similarweb.com\/blog\/wp-json\/wp\/v2\/posts\/147863","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.similarweb.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.similarweb.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.similarweb.com\/blog\/wp-json\/wp\/v2\/users\/499"}],"replies":[{"embeddable":true,"href":"https:\/\/www.similarweb.com\/blog\/wp-json\/wp\/v2\/comments?post=147863"}],"version-history":[{"count":5,"href":"https:\/\/www.similarweb.com\/blog\/wp-json\/wp\/v2\/posts\/147863\/revisions"}],"predecessor-version":[{"id":198499,"href":"https:\/\/www.similarweb.com\/blog\/wp-json\/wp\/v2\/posts\/147863\/revisions\/198499"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.similarweb.com\/blog\/wp-json\/wp\/v2\/media\/185522"}],"wp:attachment":[{"href":"https:\/\/www.similarweb.com\/blog\/wp-json\/wp\/v2\/media?parent=147863"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.similarweb.com\/blog\/wp-json\/wp\/v2\/categories?post=147863"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.similarweb.com\/blog\/wp-json\/wp\/v2\/tags?post=147863"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}