Submit a Ticket
Support Center » Knowledgebase » Programming Languages » Using robots.txt in order to optimize your site performance and reduce your website load

Using robots.txt in order to optimize your site performance and reduce your website load

Web crawlers or so called web spiders/robots can cause significant load on your website, especially if you are using platforms such as WordPress. 


This article contains couple of quick ways to reduce the load from crawlers, but we recommend that you consult with a web designer/professional for more information on how to effectively implement.

Here is a sample robots.txt which is WP friendly (similar can be applied for any website and platform).

You need to create this file on your computer and upload it via FTP or your control panel file manager (in cPanel you can create this file using the cPanel File Manager directly), and place in the root folder of each of your domain names (same folder where your main index.php / index.html file resides).

Simply copy/paste between the lines "---" below:

---

Crawl-delay: 30

User-agent: *

Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Disallow: /trackback/
Disallow: /index.php
Disallow: /xmlrpc.com
Disallow: /wp-login.php
Disallow: /wp-content/plugins/
Disallow: /comments/feed/

Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/


User-agent: Yandex
Disallow: /

User-agent: Baiduspider
Disallow: /

User-agent: Googlebot-Image
Disallow: /

User-agent: bingbot
Crawl-delay: 10
Disallow:

User-agent: Slurp
Crawl-delay: 10
Disallow:

---

The above sample will slow down the search engines so that they don't aggressively scan your site all at ones (this does not impact how often the search engine will crawl your site), the code will also block some spiders such as Baiduspider (Chinese search engine, you should disable unless your site need to be indexed in Chinese), Yandex (Russian search engines, leave enabled if you have Russian website or visitors from Russia), and also prevent Google Image bot to scan your site. If you need any of these engines, simply remove the part between "User-agent" and "Disallow" and leave the remaining code.

In addition, you can control how Google and Bing index your site, and take measures to slow them down individually. For more info on Google, you need to setup an account with Google Web Master tools http://www.google.com/webmasters/tools/ and follow their direction on how to optimize the crawling rates and speeds.

For Bing, please signup/login with their webmaster tools at http://www.bing.com/toolbox/webmaster and follow their direction on how to optimize their crawling settings.

There are many other optimizations you can implement to your robots.txt file, but we recommend working with professional in order to get the best results, and avoid breaking your website.

For advanced users:

In addition to robots.txt implementation, you can block unwanted crawlers (especially those that completely ignore your robots.txt file) using the following .htaccess code:

RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} Baiduspider [OR]
RewriteCond %{HTTP_USER_AGENT} Baiduspider/2.0 [OR]
RewriteCond %{HTTP_USER_AGENT} Baiduspider/3.0 [OR]
RewriteCond %{HTTP_USER_AGENT} MJ12bot/v1.4.5 [OR]
RewriteCond %{HTTP_USER_AGENT} MJ12 [OR]
RewriteCond %{HTTP_USER_AGENT} AhrefsBot/5.1 [OR]
RewriteCond %{HTTP_USER_AGENT} YandexBot/3.0 [OR]
RewriteCond %{HTTP_USER_AGENT} YandexImages/3.0 [OR]
RewriteCond %{HTTP_USER_AGENT} YandexBot
RewriteRule . – [F,L]



 This answer was helpful  This answer was not helpful
 Back