On the Careless Usage of MySQL’s DISTINCT in Joomla’s com_content

As of Joomla 3.8.0 (and so far until Joomla 3.8.2), the articles model in the com_content extension (both frontend and backend) uses DISTINCT in the main query that returns the articles (in the getListQuery function). For the untrained eye, such a change is harmless, but, to anyone who has some experience in MySQL (and any other relational database), this is a dangerous change that can have devastating effects on large Joomla sites with high traffic. Why? Well, before explaining why, let us first examine how DISTINCT was introduced to the Joomla core…

On August 22nd, 2017, a GitHub user called JannikMichel (who is a user with a very small number of contributions), proposed this pull request. In short, the pull request was meant to redo/enhance this 3 year old pull request that added article filtering on multiple access levels/authors/categories/tags. So, when you are in the Articles page in the backend of a Joomla site, you will be able to filter on multiple categories (or authors, or tags, etc…).

Now, since Joomla allows only one access level/author/category per article, you won’t have a problem if you filter on multiple access levels/authors/categories. For tags, it’s a different story: Joomla allows an article to have multiple tags (tags are nothing but trouble by the way), so, if you filter on multiple tags, the underlying query can return the same article multiple times. To address this issue, the following code in Joomla 3.7.5 (in the articles.php file which is located under the components/com_content/models folder)…

$query->select(
	$this->getState(
		'list.select',
		'a.id, a.title, a.alias, a.introtext, a.fulltext, ' .
			'a.checked_out, a.checked_out_time, ' .
			'a.catid, a.created, a.created_by, a.created_by_alias, ' .
			// Published/archived article in archive category is treats as archive article
			// If category is not published then force 0
			'CASE WHEN c.published = 2 AND a.state > 0 THEN 2 WHEN c.published != 1 THEN 0 ELSE a.state END as state,' .
			// Use created if modified is 0
			'CASE WHEN a.modified = ' . $db->quote($db->getNullDate()) . ' THEN a.created ELSE a.modified END as modified, ' .
			'a.modified_by, uam.name as modified_by_name,' .
			// Use created if publish_up is 0
			'CASE WHEN a.publish_up = ' . $db->quote($db->getNullDate()) . ' THEN a.created ELSE a.publish_up END as publish_up,' .
			'a.publish_down, a.images, a.urls, a.attribs, a.metadata, a.metakey, a.metadesc, a.access, ' .
			'a.hits, a.xreference, a.featured, a.language, ' . ' ' . $query->length('a.fulltext') . ' AS readmore'
	)
);

…was changed to the following in Joomla 3.8.0:

$query->select(
	$this->getState(
		'list.select',
		'DISTINCT a.id, a.title, a.alias, a.introtext, a.fulltext, ' .
		'a.checked_out, a.checked_out_time, ' .
		'a.catid, a.created, a.created_by, a.created_by_alias, ' .
		// Published/archived article in archive category is treats as archive article
		// If category is not published then force 0
		'CASE WHEN c.published = 2 AND a.state > 0 THEN 2 WHEN c.published != 1 THEN 0 ELSE a.state END as state,' .
		// Use created if modified is 0
		'CASE WHEN a.modified = ' . $db->quote($db->getNullDate()) . ' THEN a.created ELSE a.modified END as modified, ' .
		'a.modified_by, uam.name as modified_by_name,' .
		// Use created if publish_up is 0
		'CASE WHEN a.publish_up = ' . $db->quote($db->getNullDate()) . ' THEN a.created ELSE a.publish_up END as publish_up,' .
		'a.publish_down, a.images, a.urls, a.attribs, a.metadata, a.metakey, a.metadesc, a.access, ' .
		'a.hits, a.xreference, a.featured, a.language, ' . ' ' . $query->length('a.fulltext') . ' AS readmore'
	)
);

By using DISTINCT, the SQL code in Joomla 3.8.0+ ensures that no row is returned twice by the query, which is a good thing. Unfortunately, this results in a much bigger problem than the original problem: DISTINCT is heavy, very heavy, especially when it is combined by GROUP BY (which is the case in the getListQuery function), in which case it will most likely need to create a temporary table (which results in a huge performance hit). A quick testing for the above query on a large Joomla site revealed that the query is, on average, 5-6 times slower when using DISTINCT. Since the above query is one of the most executed queries on a Joomla site, then it may will substantially increase the load on the server, making the site unusable especially during peak hours.

Not only is the usage of DISTINCT very harmful, it is also disappointing for several reasons:

  • It was approved by multiple Joomla core developers who consider themselves experts. We can count 3 (yes, three) expert core developers who checked this pull request and tested it successfully. This is alarming, since it means that core Joomla developers do not really read the code of rookie developers, which, in most cases, is unoptimized and buggy.
  • It wasn’t added in a condition. There is no need whatsoever to use DISTINCT when there is no multiple filtering on the tags. There must be an if condition for the usage of DISTINCT.

  • It was added to the backend and the frontend articles model. While one can understand the reasons behind adding DISTINCT to the backend, there is no explanation whatsoever on why it was added to the frontend. We have yet to see a Joomla site that allows multiple tag filtering on the frontend.

For our large clients, we are removing DISTINCT from the articles model (both from the frontend and from the backend). We always try our best to avoid core alterations, but in this situation, we have no choice. It is either a core modification or an extremely slow site.

We hope that you found this post useful. If you have any questions about it, or if you need us to optimize your Joomla website, then please contact us. We are experts in Joomla optimization, our work is very delicate, and our fees are always affordable.

2 Responses to “On the Careless Usage of MySQL’s DISTINCT in Joomla’s com_content”
  1. Comment by Victor — January 29, 2018 @ 4:55 pm

    Hello,

    I removed DISTINCT from the articles.php files as explained in your article and keep seeing error that crashes my MYSQL server. Any other solution?

    I´m using Mariadb version 5.5.56-2.el7.

    I read here ( https://github.com/joomla/joomla-cms/issues/18882 ) that maybe this is a bug on this particular MariaDB version, and that updating to 5.8 solves the issue. I will try it…

  2. Comment by Fadi — February 13, 2018 @ 1:15 pm

    Hi Victor,

    Removing DISTINCT by itself will improve the server’s performance by lessening the load on the database server, however, it doesn’t guarantee that the load will return to a manageable level. It seems that your site needs additional optimization. We will have a post soon on how to optimize a Joomla 3.8.x website. Stay tuned!

Leave a comment