Your Joomla Website Is Really Really Slow? Maybe It’s Bingbot!

A client of hours with a high traffic Joomla website called us and told us that everyday during peak hours, their website slowed to a crawl. So, as usual, we ssh’d to their server and we checked the slow query log and we didn’t notice anything unusual, which was unusual (even though the long-query-time was set to 1, which means that any query taking over 1 second [the minimum] was recorded into the slow query log). The reason why this was unusual was that the MySQL load was high, yet the MySQL slow query log was nearly empty.

So, and we have no idea why, we checked the Apache logs located under /home/domlogs (note that on some Linux servers, such logs are located under the /usr/local/apache/domlogs), and we tail’d the main daily (and current) access log using the following command:

tail -500 [ourclientjoomlawebsite.com]

and here’s a glimpse of what we saw:

157.55.39.224 – – [08/Jan/2016:09:11:10 -0500] “GET [relative-url-1] HTTP/1.1” 200 10105 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)”
157.55.39.224 – – [08/Jan/2016:09:11:10 -0500] “GET [relative-url-2] HTTP/1.1” 200 9668 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)”
157.55.39.224 – – [08/Jan/2016:09:11:10 -0500] “GET [relative-url-3] HTTP/1.1” 200 9491 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)”
157.55.39.224 – – [08/Jan/2016:09:11:10 -0500] “GET [relative-url-4] HTTP/1.1” 200 9724 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)”
157.55.39.173 – – [08/Jan/2016:09:11:10 -0500] “GET [relative-url-5] HTTP/1.1” 200 9188 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)”

As you can see in the above, Bingbot (the search engine crawler released by Microsoft back in 2010 that serves their own search engine, Bing, as well as Yahoo) hit the website with 5 page requests in one second, and it was doing that every single second! Yes – we are not exaggerating. In fact, sometimes Bingbot would request more than 10 pages in one second, but the norm was to request 5 pages every second. How did we know that it was requesting an average of 5 pages every second? Well, by running the following shell command:

grep 'bing' [ourclientjoomlawebsite.com] | wc -l

The above command returned 56,348 when we ran it 3 hours after the logs rotated, which meant that 5.21 (56,348 / (3600 * 3)) pages were crawled by Bingbot every second.

Now we have seen many things in our lives, a three-headed dog (that was in Harry Potter and the Philosopher’s Stone, and the name of that dog was Fluffy), a three-headed monkey (that was in the Tales of Monkey Island game, although, and to be fair about it, we didn’t actually see the three-headed monkey, we were just told about it in the game, and so we assumed that it existed), a smiling taxi driver who drove smoothly and who conducted an excellent and fun conversation and who seemed to be happy about everything, and a warm Montreal winter that ended in early January (yes, that really happened back in 2009). But what we haven’t seen yet was a search engine robot trying to crawl a website with such aggressiveness and during peak traffic times – it wasn’t really normal. Obviously, Bingbot was the cause of this issue and we verified it by temporarily blocking it by adding the following lines to the .htaccess file (just after the RewriteEngine On):

RewriteCond %{HTTP_USER_AGENT} bingbot[NC]
RewriteRule .* - [R=403,L]

The above lines essentially blocked Bingbot from accessing (and thus crawling) the website, and after adding them we noticed that the load dropped to around 2 from around 10. So, the load issue was definitely caused by Bingbot. But, as you might have guessed, we can not keep this particular bot blocked since our client will lose his rankings with Bing.

So, we needed a way to tell Bingbot to slow down, and a quick research about the subject revealed Bing were aware about the load issue that their bot creates on the servers, and that they created a workaround to address it. The workaround was a small setting in the robots.txt file that tells Bingbot to take it easy a bit. In short, the setting will tell Bingbot how many seconds it has to wait until it hits the website with another request.

So, how to tell Bingbot to slow down?

You can tell Bingbot to slow down its crawl rate by adding the following to the end of your robots.txt file (note: we bolded and redded the last line since it is where the most important juice is, but you will need to add all the lines below, not just the red one):

# Slow down bing
User-agent: msnbot
Disallow: /administrator/
Disallow: /ads/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/
Crawl-delay: 5

The value of Crawl-delay (which is 5 in the above example), tells Bingbot to wait for 5 seconds between each page crawl.

We have to say that we were a bit (well, more than a bit) skeptical at first – we thought that Bingbot will just ignore that setting and keep crawling the website at their usual rate. But, we were pleasantly surprised when the next day, we noticed that Bingbot reduced its crawling speed to one page every 5 seconds, positively affecting the load on the server! The problem was solved!

But, what if Bingbot didn’t respect the “Crawl-delay” setting?

Well, in that case, we would have resorted to drastic measures. Such measures would have ranged from creating a special, module-less template for Bingbot to completely blocking this crawler by re-adding the above .htaccess lines. Of course, we would have to inform the client, who will have to choose between losing 50% of their traffic during peak hours because of Bingbot or losing about 10% of their traffic (the 10% of their traffic represents the incoming human traffic from Bing or Yahoo).

But wouldn’t setting the “Crawl-delay” setting to a high number result in an SEO hit specific to Bing/Yahoo?

Well, that’s a tricky question. Bing officially recommends keeping that Crawl-delay number low in order to ensure that Bing always has a fresh version of your website (it doesn’t say anything about SEO impact when you use that value). But we think that Bing‘s expectations of servers are somehow unrealistic – since not setting the Crawl-rate value will unleash Bing at turbo crawl (yes, you just witnessed the first oxymoron in this post) speed on your website, quickly swamping the server in case the website has many pages. While we’re not sure about the SEO implications with Bing, we think it’s of little importance when compared to the server health and responsiveness since the latter is a well-known SEO measure for Google. And, in all fairness, any website administrator will choose to accommodate Google over Bing any day!

So, if your website is very slow during peak hours and not-so-fast during non-peak-hours, then check your Apache logs for aggressive Bingbot crawling. If that’s the case, then try reducing the Crawl-speed value as per the above suggestion. If that didn’t work, or if the issue has nothing to do with Bingbot, then please contact us. We are here for you, we work hard, and we will solve any Joomla problem for a super affordable fee!

No comments yet.

Leave a comment