How We Are Handling Traffic from Bots

An emerging and an extremely annoying issue that (almost all) large websites are currently experiencing is bot traffic. Of course, bot traffic has always been an issue, but it was, to a certain extent, manageable and way less vexing. However, recently, bot traffic has become very aggressive, especially on high traffic websites. So why this is becoming a major issue all of a sudden?

A couple of months ago, one of our clients asked us to check the validity of their traffic for the previous day as reported by Google Analytics. In case you don’t know, Google Analytics is known for “trying” to discount bot traffic for their stats, but the numbers that the client was seeing were way out of their daily range. So we checked their data and we noticed that there was a huge traffic from cities with very low population, such as Boardman, OR. A quick examination revealed that traffic from most of these cities originated from an Amazon network, and so we filtered traffic from Amazon networks in Google Analytics. That made things much better, but later on, rogue IPs from other networks started hitting the website, and so the whack-a-mole game started. It was fun at first, but later it became tedious and annoying. So we decided to automate the process:

  • We got the top IPs visiting the website using a Linux shell script.
  • Any IP hitting the website with more than 10K visits was checked for its network and its location (the checking was done through API).
  • If it was outside the US, then it was automatically banned (through the .htaccess file).
  • If it was inside the US, then it was sent for human examination, where a human must decide on whether traffic from this IP was automated or coming from a legitimate organization (highly unlikely, but still possible).

Adopting the above process (which was mostly automated) lessened the load on our clients’ servers and reduced those fake impressions substantially.

Yes – we noticed that we didn’t use the word Joomla anywhere in the article, but really this an across the board problem and affecting all kinds of CMS’s.

We hope that you found our little post useful. If you are running into the same issue, then please contact us. Our rates are fair, our work is scientific, and our Sherlock Holmes investigative techniques are improving by the day!

No comments yet.

Leave a comment