An Intelligent Approach to URL Canonicalization in Joomla

We are currently working on a very large project (yes, we’re even working on the weekends) – this project consists of building a Joomla website from scratch for a government agency (many government agencies use Joomla – see why Joomla is a good choice for government websites). Naturally, the project has an SEO aspect – and for such a huge project, you can imagine the amount of SEO work that needs to be done (both at the development level and at the content level). Our job was to only handle SEO at the development level, so we ensured that:

  • All URLs are SEF (Search Engine Friendly) URLs. Of course, that it’s an easy thing to do if you don’t have any custom extensions – but once you do (have custom extensions), you need to develop some code to make the URLs of your custom extensions search engine friendly.
  • All the pages – even the pages generated by our custom extensions – have the following header tags:

    • Page Title
    • Meta description
    • Meta keywords

    We generate the above tags automatically and we also allow the administrator to override that by entering these tags manually.

  • All the URLs are uniform and consistent. For example, all the URLs have www, and we use absolute links (instead of relative links) across the board.

A problem that we ran into was that, in many cases, we had the exact same content (but with very slight difference, such as a different breadcrumb) under multiple URLs. For example, the following two links:

http://www.ourclientjoomlawebsite.com/clients/client-1.html

and

http://www.ourclientjoomlawebsite.com/division-1/clients/client-1.html

had the exact same content with the exception of a different breadcrumb. We had three options to solve the problem:

  1. Add a “no index” attribute to the robots meta tag in the duplicate pages: This meant that we needed to know which pages to index and which not to index. That was a lot of work!
  2. Add a “Disallow” entry for all the duplicate pages in the robots.txt file: Again, the problem with this approach was that we needed to have a list of pages that we should not index.

Not only the above two methods involved a lot of work, but they also meant that the client had to do some SEO work on his part, something that neither the client nor us wanted to happen. So we thought of a third option which is a much more intelligent approach to URL Canonicalization in Joomla (and all CMSs in general):

  1. Dynamically generate canonical URLs: This is how this will work:
    • When the website is first launched, there’ll be no URL canonicalization whatsoever.
    • Everytime a URL is visited by a human (and not a robot), a certain weight to that URL is added. The weight percentage of that URL is the weight of that URL divided by the total weight of all URLs that have the same content.

    • Once a URL’s weight is above 10 (that number could be changed) and the weight percentage of that URL is above 70%, then that URL will be the canonical URL. Once that happens, we will stop doing anymore weighting for all URLs having that same content as we already found their canonical URL.

    The last method is an intelligent for URL canonicalization, because it’ll be all automatic, and it’ll be based on how the users access your website, which’ll be the best method to weigh URLs especially if your navigation is logical and correct.

If you need to implement intelligent URL canonicalization (no, we did not pattern the concept – but it’s us who invented it!), then why not ask us to help you. We have done it before (for the government!), we have the best Joomla experts on the planet, we don’t charge much, and we really love to work on challenging projects!

No comments yet.

Leave a comment