Home News Google is going to stop supporting crawl-delay, nofollow, and noindex in robots.txt

Google is going to stop supporting crawl-delay, nofollow, and noindex in robots.txt

by Umair Hashmi
2.1k views
Google crawl-delay nofollow noindex

Google announced on July 02, 2019 that they are not going to support crawl-delay, nofollow, and noindex directives in robots.txt file.

While open-sourcing our parser library, we analyzed the usage of robots.txt rules. In particular, we focused on rules unsupported by the internet draft, such as crawl-delay, nofollow, and noindex. Since these rules were never documented by Google, naturally, their usage in relation to Googlebot is very low. Digging further, we saw their usage was contradicted by other rules in all but 0.001% of all robots.txt files on the internet. These mistakes hurt websites’ presence in Google’s search results in ways we don’t think webmasters intended.

Google

There are some alternative ways to use the proper directives.

Noindex in robots meta tags: Supported both in the HTTP response headers and in HTML, the noindex directive is the most effective way to remove URLs from the index when crawling is allowed.

404 and 410 HTTP status codes: Both status codes mean that the page does not exist, which will drop such URLs from Google’s index once they’re crawled and processed.

Password protection: Unless markup is used to indicate subscription or paywalled content, hiding a page behind a login will generally remove it from Google’s index.

Disallow in robots.txt: Search engines can only index pages that they know about, so blocking the page from being crawled usually means its content won’t be indexed.  While the search engine may also index a URL based on links from other pages, without seeing the content itself, we aim to make such pages less visible in the future.

Search Console Remove URL tool: The tool is a quick and easy method to remove a URL temporarily from Google’s search results.

Google

You may also like

Leave a Comment

one × 4 =