19 Jul Google SEO 101: Blocking Special Files in Robots.txt via @MattGSouthern
Google’s John Mueller answers a question about using robots.txt to block special files, including .css and .htacess.
This topic was discussed in some detail in the latest edition of the Ask Google Webmasters video series on YouTube.
Here is the question that was submitted:
“Regarding robots.txt, should I ‘disallow: /*.css$’, ‘disallow: /php.ini’, or even ‘disallow: /.htaccess’?”
In response, Mueller says Google can’t stop site owners from disallowing those files. Although it’s certainly not recommended.
“No. I can’t disallow you from disallowing those files. But that sounds like a bad idea. You mention a few special cases so let’s take a look.”
In some cases blocking special files is simply redundant, although in other cases it could seriously impact Googlebot’s ability to crawl a site.
Here’s an explanation of what will happen when each type of special file is blocked.
Related:How to Address Security Risks with Robots.txt Files
Blocking CSS
Crawling CSS is absolutely critical as it allows Googlebot to properly render pages.
Site owners may feel it’s necessary to block CSS files so the files don’t get indexed on their own, but Mueller says that usually doesn’t happen.
Google needs the file regardless, so even if a CSS file ends up getting indexed it won’t do as much harm as blocking it would.
This is Mueller’s response:
“‘*.css’ would block all CSS files. We need to be able to access CSS files so that we can properly render your pages.
This is critical so that we can recognize when a page is mobile-friendly, for example.
CSS files generally won’t get indexed on their own, but we need to be able to crawl them.”
Blocking PHP
Using robots.txt to block php.ini isn’t necessary because it’s not a file that can be readily accessed anyway.
This file should be locked down, which prevents even Googlebot from accessing it. And that’s perfectly fine.
Blocking PHP is redundant, as Mueller explains:
“You also mentioned PHP.ini – this is a configuration file for PHP. In general, this file should be locked down, or in a special location so nobody can access it.
And if nobody can access it then that includes Googlebot too. So, again, no need to disallow crawling of that.”
Blocking htaccess
Like PHP, .htaccess is a locked down file. That means it can’t be accessed externally, even by Googlebot.
It does not need to be disallowed because it can’t be crawled in the first place.
“Finally, you mentioned .htaccess. This is a special control file that cannot be accessed externally by default. Like other locked down files you don’t need to explicitly disallow it from crawling since it cannot be accessed at all.”
Related:Best Practices for Setting Up Meta Robots Tags & Robots.txt
Mueller’s Recommendations
Mueller capped off the video with a few short words on how site owners should go about creating a robots.txt file.
Site owners tend to run into problems when they copy another site’s robots.txt file and use it as their own.
Mueller advises against that. Instead, think critically about which parts of your site you do not want to be crawled and only disavow those.
“My recommendation is to not just reuse someone else’s robots.txt file and assume it’ll work. Instead, think about which parts of your site you really don’t want to have crawled and just disallow crawling of those.”
Related articles:
- Best Practices for Setting Up Meta Robots Tags & Robots.txt
- Google Says Robots.txt Blocking Certain External Resources is Okay
Sorry, the comment form is closed at this time.