Prevent Google from crawling certain URLs

mceclip0.png
 
Some URLs in your store may not require Google crawling, because their content is not necessarily relevant to your SEO. To prevent Google from crawling certain pages, you can edit your robots.txt file from the Configuration section > SEO > Robots.txt.


What is the robots.txt file?


The robots.txt file tells a search engine's crawlers which URLs it can access on your store. It is located at the root of your site. 👉 Find the detailed Google help on this subject.


How to prevent the crawling of your URLs?


To edit your robots.txt file from the Configuration section > SEO > Robots.txt, select the manual creation mode:
 
mceclip0.png
 
Here's what you'll need to add to your file: 


User-agent: *

Disallow: */d/

Disallow: */f/

Disallow: /URL


The last line "Disallow: /URL" corresponds to the URL for which you want to prevent crawling. It will therefore be necessary to replace "/URL" by the URL of the page concerned, without its root (domain name). Example below with a product page:
 
mceclip1.png


The full product URL is: https://bentubentu-en.com/bento-boxes/kid-boxes/double-boite-a-bento.html


The part of the URL to fill out in Disallow will be: bento-boxes/kid-boxes/double-boite-a-bento. To prevent this URL from being crawled, you will have to add in the robots.txt file: 
User-agent: *Disallow: */d/Disallow: */f/Disallow: bento-boxes/kid-boxes/double-boite-a-bento.

 

mceclip2.png


💡 If you want to prevent the crawling of several pages, you will simply have to add your URLs to the following in your Robots.txt file.


User-agent: *

Disallow: */d/

Disallow: */f/

Disallow: /URL

Disallow: /URL

Disallow: /URL

Disallow: /URL
 


The particularity of sorting pages and filters


Google Search Console crawlers can sometimes struggle to determine whether sort and filter pages should be crawled. Sort and filter pages are the results pages generated when your visitors display, for example, your catalog by increasing price, or when they select only your products below €25.
 

 
WiziShop has so far prevented search engines from crawling these URLs, by adding them by default to your Robots.txt file so as not to clutter the indexing of your site. However, as the Search Console does not always know if this block was intended (frequent error reported: "Indexed despite the blocking by the robots.txt file"), our indexing system has been modified. Now, your Robots.txt file no longer contains your filter and sort URLs:
 
Disallow: */price-low-to-high
Disallow: */price-high-to-low
Disallow: */alphabetical-a-z
Disallow: */alphabetical-z-a
Disallow: */oldest-products
Disallow: */newest-products
Disallow: */d/
Disallow: */f/
  
Today, we manage all internal links to filters with <button onclick> tags instead of the standard <a href> tag. This configuration prevents search engines from detecting internal links, while still allowing visitors to click on them. The "noindex" tag also remains on each generated page to prevent indexing.
 
💬 Do not hesitate to contact your Business Coaches if you have any questions!