The Mysterious Intermittent MySQL Crash on a Joomla Website

Most people love mystery movies, there is something thrilling about them. We also love mystery movies, but there is something that we love even more, and it is Joomla mysteries, and yesterday we were fortunate enough that we were assigned to investigate one of these mysteries…

At around eleven hundred hours yesterday morning, a regular client of ours told us that they were seeing the following error on their Joomla site:

Error displaying the error page: Application Instantiation Error: Could not connect to MySQL.

Of course, the above error is probably the most common fatal error on Joomla sites, and we have written about it before, yet this time, it was different, because a few minutes later, the client called back and told us that the site was now working. Hmmm… It was the first time where we see this issue resolve itself. That was weird, we thought. We asked the client if they wanted to investigate further, but they told us that they were OK; it was just a hiccup…

A couple of hours later, they emailed us, and told us that they were seeing the same error message that they saw in the morning. By the time we received the email and checked the site again, the error was gone. So, we emailed the client back, and we recommended that they authorize an investigation about this issue, because instability issues such as these are never a good sign. The client immediately authorized the investigation, and the first thing that we checked was the MySQL error log which is located under the /var/lib/mysql folder (the file name typically ends with .err). Here’s what we saw:

2017-11-07 11:03:29 82870 [Note] Plugin 'FEDERATED' is disabled.
2017-11-07 11:03:29 82870 [Note] InnoDB: Using atomics to ref count buffer pool pages
2017-11-07 11:03:29 82870 [Note] InnoDB: The InnoDB memory heap is disabled
2017-11-07 11:03:29 82870 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-11-07 11:03:29 82870 [Note] InnoDB: Memory barrier is not used
2017-11-07 11:03:29 82870 [Note] InnoDB: Compressed tables use zlib 1.2.3
2017-11-07 11:03:29 82870 [Note] InnoDB: Using Linux native AIO
2017-11-07 11:03:29 82870 [Note] InnoDB: Using CPU crc32 instructions
2017-11-07 11:03:29 82870 [Note] InnoDB: Initializing buffer pool, size = 64.0M
2017-11-07 11:03:29 82870 [Note] InnoDB: Completed initialization of buffer pool
2017-11-07 11:03:29 82870 [Note] InnoDB: Highest supported file format is Barracuda.
2017-11-07 11:03:29 82870 [Note] InnoDB: Log scan progressed past the checkpoint lsn 65178299172
2017-11-07 11:03:29 82870 [Note] InnoDB: Database was not shutdown normally!
2017-11-07 11:03:29 82870 [Note] InnoDB: Starting crash recovery.
2017-11-07 11:03:29 82870 [Note] InnoDB: Reading tablespace information from the .ibd files...
2017-11-07 11:03:29 82870 [Note] InnoDB: Restoring possible half-written data pages
2017-11-07 11:03:29 82870 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Doing recovery: scanned up to log sequence number 65178300463
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 1 row operations to undo
InnoDB: Trx id counter is 379714816
2017-11-07 11:03:29 82870 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
InnoDB: Apply batch completed
InnoDB: Starting in background the rollback of uncommitted transactions
2017-11-07 11:03:30 82870 [Note] InnoDB: 128 rollback segment(s) are active.
2017-11-07 11:03:30 7f6cf1e16700  InnoDB: Rolling back trx with id 379714508, 1 rows to undo
2017-11-07 11:03:30 82870 [Note] InnoDB: Waiting for purge to start
2017-11-07 11:03:30 82870 [Note] InnoDB: Rollback of trx with id 379714508 completed
2017-11-07 11:03:30 7f6cf1e16700  InnoDB: Rollback of non-prepared transactions completed
2017-11-07 11:03:30 82870 [Note] InnoDB: 5.6.38 started; log sequence number 65178300463
2017-11-07 11:03:30 82870 [Note] Server hostname (bind-address): '*'; port: 3306
2017-11-07 11:03:30 82870 [Note] IPv6 is available.
2017-11-07 11:03:30 82870 [Note]   - '::' resolves to '::';
2017-11-07 11:03:30 82870 [Note] Server socket created on IP: '::'.
2017-11-07 11:03:30 82870 [Note] Event Scheduler: Loaded 0 events
2017-11-07 11:03:30 82870 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.38'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)
2017-11-07 13:11:21 11675 [Note] Plugin 'FEDERATED' is disabled.
2017-11-07 13:11:22 11675 [Note] InnoDB: Using atomics to ref count buffer pool pages
2017-11-07 13:11:22 11675 [Note] InnoDB: The InnoDB memory heap is disabled
2017-11-07 13:11:22 11675 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-11-07 13:11:22 11675 [Note] InnoDB: Memory barrier is not used
2017-11-07 13:11:22 11675 [Note] InnoDB: Compressed tables use zlib 1.2.3
2017-11-07 13:11:22 11675 [Note] InnoDB: Using Linux native AIO
2017-11-07 13:11:22 11675 [Note] InnoDB: Using CPU crc32 instructions
2017-11-07 13:11:23 11675 [Note] InnoDB: Initializing buffer pool, size = 64.0M
2017-11-07 13:11:23 11675 [Note] InnoDB: Completed initialization of buffer pool
2017-11-07 13:11:23 11675 [Note] InnoDB: Highest supported file format is Barracuda.
2017-11-07 13:11:23 11675 [Note] InnoDB: Log scan progressed past the checkpoint lsn 65220975371
2017-11-07 13:11:24 11675 [Note] InnoDB: Database was not shutdown normally!
2017-11-07 13:11:24 11675 [Note] InnoDB: Starting crash recovery.
2017-11-07 13:11:24 11675 [Note] InnoDB: Reading tablespace information from the .ibd files...
2017-11-07 13:11:27 11675 [Note] InnoDB: Restoring possible half-written data pages
2017-11-07 13:11:27 11675 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Doing recovery: scanned up to log sequence number 65221104018
2017-11-07 13:11:30 11675 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
InnoDB: Apply batch completed
2017-11-07 13:11:33 11675 [Note] InnoDB: 128 rollback segment(s) are active.
2017-11-07 13:11:33 11675 [Note] InnoDB: Waiting for purge to start
2017-11-07 13:11:33 11675 [Note] InnoDB: 5.6.38 started; log sequence number 65221104018
2017-11-07 13:11:33 11675 [Note] Server hostname (bind-address): '*'; port: 3306
2017-11-07 13:11:34 11675 [Note] IPv6 is available.
2017-11-07 13:11:34 11675 [Note]   - '::' resolves to '::';
2017-11-07 13:11:34 11675 [Note] Server socket created on IP: '::'.
2017-11-07 13:11:35 11675 [Note] Event Scheduler: Loaded 0 events
2017-11-07 13:11:35 11675 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.38'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

You can clearly see that within 2 hours, MySQL restarted twice, by itself! But why? This is where this whole thing started to spin into a fun Joomla mystery! We looked at all the logs, for a sign that can lead us to the real cause of the problem, and after many hours of investigation, we found a clue. It was the following lines in the messages file which is located under the /var/log folder:

Nov  7 11:03:21 host kernel: [42008818.160677] Out of memory: Kill process 39364 (mysqld) score 27 or sacrifice child
Nov  7 11:03:21 host kernel: [42008818.161356] Killed process 39364, UID 498, (mysqld) total-vm:1402564kB, anon-rss:16576kB, file-rss:252kB
Nov  7 11:03:22 host kernel: [42008818.858715] php invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0

If you carefully read the above lines, you will understand what was happening. The system ran out of memory, and it had to make a decision: either to kill the mysqld process or to sacrifice its child(ren). In the next line, you can see that the kernel decided to kill the mysqld process and immediately went ahead with this decision, which resulted in restarting the MySQL server.

OK, now we know it’s a memory issue. But why? Doesn’t their server have enough memory? We issued the following command in the shell to know the amount of memory the server has…

free -m

… and it returned the following:

             total       used       free     shared    buffers     cached
Mem:          1815       1545        270          0        130       1004
-/+ buffers/cache:        409       1405
Swap:         1999        159       1840

Aha! So this server has less than 2 GB of RAM (it is obviously a VPS, and not a server), and 0% of this RAM was free. Of course, Linux tends to allocate unused memory for cache to speed things up, but still, 2 GB is obviously too small for this website, as the server is running out memory.

We offered 2 solutions to the client: 1) We can optimize the queries on the site in order to reduce the memory usage on the database server, or 2) They upgrade to a real dedicated server with at least 16 GB of RAM.

The client opted for the latter solution, which is, in our opinion, the ideal solution.

So there you have it, if you have intermittent downtimes on your MySQL server, then check your logs (especially the /var/log/messages file) to see if your server is running out of memory. If that’s the case, then the best route that you can take is to add resources to your server (or migrate to a more powerful server that has more RAM). If you want to take an alternative route, then just contact us. We are always ready to help, our fees are right, and our work is ultra professional!

No comments yet.

Leave a comment