pimux.de is facing some connectivity issues

For the past weeks the pimux.de XMPP server is facing connectivity issues which got worse within the last days. Thanks to a friendly user I was made aware of this issues which already triggered my monitoring a few times.

25-02-2024 - 1 minute, 35 seconds -

Perhaps you were noticing some connection problems recently. Your XMPP client maybe showed some errors because it could not connect to your account at pimux.de or you were not able to join a group chat or start a conversation. I was aware of some performance issues which are related to a database cleanup which runs at night (in Germany; about 3am UTC) so I was not alarmed that much.

Screenshot%202024-01-12%20at%2022-17-02%20pimux.de%20-%20XMPP%20server

One friendly user sent me an email and informed me that he noticed connectivity issues, too. In the past few days I noticed them to, they got worse a lot. A maintenance related downtime of about a minute or two per day would be okay but the problems for pimux.de got worse, the downtime got longer and the problems occurred during the day, too, as you can see in this status dashboard: https://uptimekuma.finnchristiansen.de/status/pimux

An uptime of about 97,7 percent is not acceptable and I started to find the root cause of these problems. A database cleanup job related to the Message Archive Management caused a lot disk IO which stopped Prosody to process any new connections. Today I have tweaked the configuration and I will inspect the Message Archive Management further to figure out what is going wrong here. A daily downtime of about 30 minutes in total is totally unacceptable.

I just wanted to let you know that I am aware of this issue and will find a solution. Thank you all for using pimux.de as your XMPP server!

👋 By the way: Text ma at finn@pimux.de if you have any questions.

Also I have attached a screenshot of my Grafana dashboard which shows the high IO load while cleaning up the database, it looks pretty bad to be honest.

Screenshot%202024-01-12%20at%2020-50-47%20Node%20Exporter%20Full%20-%20Dashboards%20-%20Grafana

Update: This issue has been solved a few days later by removing a faulty user which had stored millions of messages in the database probably cause by a loop.