Google’s Search Relations answered several questions about website indexing in the latest episode of the podcast, Search Off The Record.
Topics discussed were how to prevent Googlebot from crawling certain sections of a page and how to prevent Googlebot from accessing a website at all.
Google’s John Mueller and Gary Illyes answered the questions examined in this article.
Blocking Googlebot for certain sections of the website
Mueller says it is impossible to the question of how to prevent Googlebot from crawling certain sections of the website, for example “also purchased” sections on product pages.
“The short version is that you can’t block a specific section of an HTML page from being crawled,” Mueller said.
He then suggested two possible strategies for dealing with the problem, emphasizing that neither is an ideal solution.
Mueller suggested using the data-nosnippet HTML attribute to prevent text from appearing in a search snippet.
He assured all listeners that reusing the content in question across multiple pages was not an issue that needed to be addressed.
“There is no need to prevent Googlebot from detecting these types of duplicates,” he added.
Blocking Googlebot access to a website
In response to a question about preventing access by Googlebot any As part of a website, Illyes provided an easy to understand solution.
“The easiest way is robots.txt: if you add a disallow: / for the googlebot user agent, googlebot will leave your site alone as long as you keep that rule there,” explained Illyes.
For those looking for a more robust solution, Illyes offers another method:
“If you even want to block network access, you need to create firewall rules that load our IP ranges into a deny rule,” he said.
For a list of Googlebot’s IP addresses, see Google’s official documentation.
Although it is impossible to prevent Googlebot from accessing certain sections of an HTML page, methods such as using the data-nosnippet attribute can provide control.
If you’re considering completely blocking Googlebot from your website, a simple ban rule in your robots.txt file will suffice. However, more extreme measures such as creating specific firewall rules are also possible.
Featured image created by the author using Midjourney.
source: Google search confidential