Using Sphinx to Substantially Enhance the Speed of the Joomla Search

Warning: This is an extremely advanced post that requires some serious system administration skills and some serious programming skills. If you feel you’re not up to it, then don’t do it! Ask some Joomla experts (such as, ahem, us!) to do the implementation for you – you’ll save time and your blood pressure will not suffer!

Warning: The implementation in this post requires you to modify a core Joomla file, which means that you will have to ensure that new updates won’t overwrite your modification. If you’re not comfortable with core changes then please contact us and we can develop a new search plugin for you instead of modifying the core plugin.

Note: If you want to rebuild all the Sphinx indexes from scratch, then you will need to stop the searchd listener first using the following command: /usr/local/sphinx/bin/searchd --stop .

Note: Our solution will work. We suggest you don’t deviate from the guide below until everything works for you, and then you can try experimenting with different settings.

Note: While the process below is straightforward and safe, we suggest you take a backup of your whole server before doing anything. Well, at least you should take a backup of your filesystem, your database, and your my.cnf file (this is the MySQL configuration file which is located under the etc folder).

Final note: Every step below is essential for Sphinx to work! You can’t eliminate any steps from our guide and expect that Sphinx works for you. You have been warned!

Let’s face it: Joomla’s search is inefficient – and by inefficient we mean that searches start taking ages once you have over 20k articles on your website. Since we have many clients who are using Joomla and who have a lot of articles on their sites, we have decided to do something about the search! We have decided to use Sphinx…

We first learned about Sphinx a couple of years ago when a large company contacted us and told us that they would like to integrate Sphinx on a mega Joomla website in order to speed up the searches. At that time, our plates were full and we didn’t have time to take on such an endeavor, especially since we have never done it before, so we politely refused (that’s unlike us, but sometimes there is no other option). We were curious nonetheless, and we kept the name Sphinx in the back of our minds!

Fast forward to a couple of weeks ago, when we felt obliged that we had to do something about the search performance (or un-performance, that is) for our major clients’ sites! The searches were just taking too long and causing load issues on the server. It was our duty to do something! And so we thought about giving Sphinx a shot, and boy we were impressed and we never regretted it! Yes – it was a frustrating and downright scary experience, but the end results were astonishing! And since we love sharing and we really want Joomla to become the best CMS ever, we have decided to write a post that will fully explain how we did it!

Are you ready for this super-adventure? Let’s start!

How to install Sphinx on CentOS

Before doing anything, we needed to install the Sphinx search engine on CentOS (CentOS is the underlying OS for mostly all WHM/cPanel powered servers and it is the OS that most of our clients use). Here’s what we did:

  • We logged in to the shell (ssh) as root.
  • We issued the following commands (in order):

    cd /usr/local
    mkdir sphinx
    cd sphinx
    wget http://sphinxsearch.com/files/sphinx-2.2.6-release.tar.gz
    tar -zxf sphinx-2.2.6-release.tar.gz
    cd sphinx-2.2.6-release
    mv * /usr/local/sphinx/
    cd ..
    rm -Rf sphinx-2.2.6-release
    rm -Rf sphinx-2.2.6-release.tar.gz
    ./configure --prefix=/usr/local/sphinx --with-mysql
    make
    make install

  • The above commands installed Sphinx on the server. Note that some of them take a long time to execute, so be patient!

Despite Sphinx being installed on the server (yuppy!), we still couldn’t do anything with it because we needed to configure the Sphinx Index

How to configure the Sphinx index for Joomla’s search

Sphinx is a powerful database engine – but it can’t do much without an index. In this context, an index is a collection of data stored in a specific way for fast retrieval. In the case of a Joomla search, an index consists of a MySQL query that will grab the id, the title, and the introtext and the fulltext (concatenated as text) from the #__content table.

In order to create the index, we needed to first create the Sphinx configuration file. Here’s how we did it:

  • While still logged in to ssh, we issued the following command:

    vi /usr/local/sphinx/etc/sphinx.conf

    The above command will open a new file ready to write using the vi editor (you can use nano, but we’re used to vi – it’s more nerdish).

  • We then clicked on i (yes, the letter i on the keyboard), and then pasted (you can paste in vi, and in linux in general, by just right-clicking) the following code (for your convenience, we have added extensive comments next to each line in order to explain what each line does):

    # source: The source will the a SQL query on the #__content table. The SQL query will be consist of the id, the title, the introtext, and the fulltext.
    # We must specify the connection parameters to the MySQL database, which can be found in the configuration.php file which is located under the root directory of your Joomla website.
    # Note that we are assuming that your MySQL host is localhost (in other words, your MySQL database server resides on the same server as your Joomla website). If this is not the case, then you will need to change localhost and make it point to the right host of your MySQL database server.
    
    source joomla_search_source
    {
    	type		= mysql # Do not change this
    	sql_host	= localhost # You will need to change this if your MySQL server resides on a different server.
    	sql_user	= [your-joomla-db-user] # The username which has at least read access to your MySQL database. No brackets.
    	sql_pass	= [your-joomla-db-password] # The password of the aforementioned username. No brackets.
    	sql_db		= [name-of-your-joomla-database] # The name of your Joomla database. No brackets.
    
    	# Make sure you only include the fields that are being used in your searches - otherwise, you will inflate the size of the index and slow down the searches.
    	# We will just index the id, the title, and a concatenation of the introtext and the fulltext.
    	# Note: You should replace #__ with your table alias, or else the below query (and the index) will not work!
    	sql_query	= SELECT `id`, `title`, CONCAT(`introtext`, `fulltext`) AS `text` FROM #__content WHERE `state`=1
    }
    
    # index: Here we define the parameters of the index created by Sphinx. We will call this joomla_search.
    # If you want to create more indexes then you will need to create another source (like the one above) and another index (like this one) and ensure that the source attribute in the index settings points to the name of the source defined in your source.
    # Note: We initially called the index joomla-search, but everytime we tried to test a query we got the following error: Query failed: unknown local index 'joomla' in search request. It turned out that you cannot have a hyphen (a dash) in the name of the Sphinx index, so we changed it to an underline.
    
    index joomla_search
    {
    	source		= joomla_search_source # Note that the source is the same one specified as the name of the source above.
    	path		= /usr/local/sphinx/var/data/joomlasearch # The indexed data will be stored in this location.
    	min_word_len	= 4 # Sphinx will ignore any words (in the search string) that are less than 4 characters long. You might want to change this to 3 if you have keywords which are 3 characters long. If not, then it's better to leave it as it is as smaller keywords tend to be costly on any search engine.
    }
    
    # indexer: These are the settings of the indexer. Here we just want to increase the memory limit allocated to the indexer, which defaults to a very low value (32 megabytes).
    indexer
    {
    	mem_limit	= 1024M # 1024 megabytes is a comfortable number. If you set mem_limit to a low value then the indexing can take a long a time, if you set it too high then MySQL can timeout.
    }
    
    # searchd: The search engine paramters.
    searchd
    {
    	log		= /usr/local/sphinx/var/log/searchd.log # This is the location of the Sphinx log - logs are always good when something goes wrong.
    	max_children	= 50 # Leave this as 50.
    	pid_file	= /usr/local/sphinx/var/log/searchd.pid # No need to change this at all.
    }

  • We saved the configuration file by typing the following (while still in vi):

    :wq!

Now that we have created the configuration file for the index, we had to build the index…

Building the Sphinx index

The configuration file was ready, so now it was time to build the index. We built it using the following shell command:

/usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf --all

The above command generated the following output:

Sphinx 2.2.6-id64-release (r4843)
Copyright (c) 2001-2014, Andrew Aksyonoff
Copyright (c) 2008-2014, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file ‘/usr/local/sphinx/etc/sphinx.conf’…
indexing index ‘joomla_search’…
WARNING: Attribute count is 0: switching to none docinfo
collected 54142 docs, 154.4 MB
sorted 15.1 Mhits, 100.0% done
total 54142 docs, 154423963 bytes
total 19.459 sec, 7935558 bytes/sec, 2782.25 docs/sec

When we saw the output above, we were rejoiced! It seemed that our configuration file worked and Sphinx was able to index our query! We’re close, very close!

Starting the searchd listener

Now that Sphinx indexed our query, we were ready to start the searchd listener. searchd is a process that will listen on port 9312 for any Sphinx queries and will reply back with results. In order to start searchd, all we needed was to issue the following shell command:

/usr/local/sphinx/bin/searchd

Here’s what we saw when we ran the above command:

Sphinx 2.2.6-id64-release (r4843)
Copyright (c) 2001-2014, Andrew Aksyonoff
Copyright (c) 2008-2014, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file ‘/usr/local/sphinx/etc/sphinx.conf’…
listening on all interfaces, port=9312
listening on all interfaces, port=9306
precaching index ‘joomla_search’
precached 1 index in 0.020 sec

Now we have Sphinx running and listening! Hurray! But, before moving to anything else, we needed to make sure that 1) new data will be indexed every midnight and 2) the listener will run when the server is rebooted.

In order to accomplish that, we did the following:

  • We ran the following command from the shell:

    crontab -e

  • We added the following lines to the end of the cron file:

    0 0 * * * /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf --rotate joomla_search > /dev/null 2>&1
    @reboot /usr/local/sphinx/bin/searchd

    The first line will rotate the index at midnight (in other words, it will re-index the SQL query), and the second line will ensure that the searchd process runs automatically when the server reboots. Both lines are very important because the first is responsible for keeping the index up-to-date while the second is responsible for ensuring that searches still work when the server is restarted.

  • We saved the cron using CTRL + o, and then exited using CTRL + x.

Now, that we have everything setup, all that we needed to do was to modify the Joomla search plugin to use Sphinx! Here’s how we did it:

  • We opened the file content.php located under the plugins/search/content folder.
  • In the function onContentSearch, we replaced everything above the line $results = array(); with the following code:

    $db = JFactory::getDbo();
    $app = JFactory::getApplication();
    
    require_once JPATH_SITE . '/components/com_content/helpers/route.php';
    require_once JPATH_ADMINISTRATOR . '/components/com_search/helpers/search.php';
    
    require('/usr/local/sphinx/api/sphinxapi.php');
    $sp = new SphinxClient();
    $sp->SetServer('localhost', 9312);
    $sp->SetMatchMode(SPH_MATCH_ALL);
    $sp->SetArrayResult(true);
    $sp->SetLimits(0,1000);
    $limit = $this->params->def('search_limit', 50);
    
    
    switch ($ordering)
    {
    	case 'relevant':
    		break;
    	default:
    		$sp->SetSortMode( SPH_SORT_EXTENDED, 'id DESC');
    		break;
    }
    $results = $sp->Query($text, 'joomla_search');
    $results = $results['matches'];
    $list = array();
    for ($i = 0; $i < count($results); $i++){
    	$sql = "SELECT a.title, a.created, CONCAT(`introtext`, `fulltext`) AS `text`, c.title AS `section`,  CONCAT(a.`id`, ':', a.`alias`) AS `slug`,  CONCAT(c.`id`, ':', c.`alias`) AS `catslug` FROM #__content a, #__categories c WHERE a.catid = c.id AND a.id='".$results[$i]['id']."'";
    	$db->setQuery($sql);
    	$list[] = $db->loadObject();
    }
    $rows[] = $list;

  • We uploaded the file back and we tested the search functionality on the website.

  • We were amazed by the speed difference: our search results were returned instantly on a website with more than 50k articles! Wow - just wow! Oh, and what was even better is that we stopped seeing queries in the slow query log and, even better yet, the load dropped significantly!

So, how long did it take us to implement Sphinx for Joomla's search for the first time?

Well, about 6-7 days (because of the many trials and errors)! But we've learned a lot (again, we have never used Sphinx before) and we have documented the whole process so that you, our precious reader, can do it (hopefully) in less time and less hassle!

We hope that you found our guide useful and you were able to successfully integrate Sphinx with Joomla's search by following it. If you're having problems with the installation of Sphinx or with the integration with Joomla's search, or if the whole thing is just a bit over your head, then please contact us. We will do the work for you in no time and for a very affordable fee!

4 Responses to “Using Sphinx to Substantially Enhance the Speed of the Joomla Search”
  1. Comment by Aitor — January 27, 2015 @ 7:52 am

    Would it be possible with Sphinx to extend the features to provide search within certain categories, or certain authors? Something like Finder taxonomies’ do in the newest versions of Joomla?

  2. Comment by Fadi — January 27, 2015 @ 2:06 pm

    Yes – you just need to add 2 other indexes (one to search the categories and another to search the authors). You will then need to rebuild the search index (don’t forget to add the new indexes to the cron as well as described above). You will finally need to modify the search plugin to search these indexes as well.

  3. Comment by Alain — June 30, 2017 @ 9:33 am

    Hello,

    It works but the only problem is that each title is not a link just plain text…

  4. Comment by Fadi — August 24, 2017 @ 2:03 pm

    Hi Alain,

    This is likely a layout issue – your template has probably an override somewhere and not displaying the links. Technically, the layout is not determined or generated by the search plugin.

Leave a comment