-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please compress the database dump #354
Comments
According to my personal benchmarks, it is. Higher levels result in high speed loss compare to a minimal compression gain, lower level results in minimal speed gains for not that much improved speed. Even on a Raspberry Pi, not a powerful computer.
This will reduce most of the time speed up the backup, as you will write massively less data with a small CPU overhead. |
Is compression compatible with borg incremental backups? What do you think @zamentur ? |
It's probably compatible (partially de-duplicable) if compressing with I'm not sure about borg's deduplication algorithm, but I imagine it works similarly to rsync, chunking files based on a rolling hash. This should work with Edit: yes, it should work. borg uses a rolling hash (buzhash/cyclic polynomial), so it creates content-based chunks. If large portions of input data do not change, Edit 2: example use-case with borg and |
What do you mean by "compatible" ? |
🆙 🙂 |
Borg works per chunk (part of a file), not per file. Creating an archive with With the new synapse upgrade, I hit this again (38GB). Since the upgrade failed, I am now 3 hours in, and counting. |
Very nice suggestion (db backup compression). We should have that on each apps i guess. |
So i added a ticket for that in yunohost issue tracker cause it's a general usecase for all backup, and it could be very useful. On my side i find an other way to deal with it in an other way: reduce the size of the db ^^ |
Well imho, all the backups (parts that could be significantly compressed) should be compressed by default, or at least we should have that possibility… well, we discussed that on the forum. But yeah, lacking that feature, compressing db export just makes a lot of sense, as it's often very easy to compress and as the storage gain can be very significant. Integrating this into the core would be a very nice improvement. |
Exactly. My database is 200GB so I can never even update Synapse from the YNH panel since it will take an hour or so. I do it via SSH with the option to skip the backups simply because of the database being enormous. And when Borg backs-up it takes 200GB on my server space before uploading to Borg, which is again massive. Synapse is the only YNH app that we have which makes it not possible to deal with via YNH Panel properly. |
Out of curiosity, how did it grow to such an enormous value ? |
I store the media in a separate folder that's just 14GB in size. We have around 500 users. Maybe that's why!? Is there a way to see what takes up so much? EDIT: I see this https://matrix-org.github.io/synapse/latest/usage/administration/database_maintenance_tools.html and this https://levans.fr/shrink-synapse-database.html - I wonder how to get the access token with the YNH install. Seems a bit of work but I will try it. If anyone knows a more simple solution, like a script that does these automatically, please let me know. Uhh
|
I think we need this tool to compress the Synapse table https://github.com/matrix-org/rust-synapse-compress-state - even the devs recognized this is an issue and made this tool. Any way to have it packaged for YNH? |
Ok after days of optimization I managed this:
Still a lot! But well 3 times less the size. What I did? First compress using this https://github.com/matrix-org/rust-synapse-compress-state#building . Need to install that package and run it like:
To run in the background. The I had to REINDEX:
And then to VACUUM:
The compression took around 3 days of nonstop compressing and put a huge toll on the CPU. 10 core CPU ad 100%. The rest take around 3-4 hours each. And will take a lot of space on your disc, it will duplicate your database on the server before doing the reindex and vacuuming. Then it will delete it. I do not know if I can do more to reduce the size. But even at this size it is not easy to manage it via the YNH panel....too big to backup, too big to restore. |
Wow, thanks a lot for documenting this ! Did you stopped the synapse service meanwhile ? |
You have to stop synapse while you do the Reindex and Vaccum full. Not when you do the compression. Vaccum full is just to restore the diskspace to the server. Basically the compression can say cut down your database from 200GB to 100GB but you wont see that on your disk because postgresql still shows 200 GB and keeps the "empty" space for the database so it will add new stuff to this empty space. idk how to explain basically your database wont grow anymore and the diskspace looks like 200 GB is full with the synapse database.... |
It's clear for me, thanks :) |
Yeah, I used a similar tool a while ago, I also used to have a DB of more than 100G, with myself as the only user 😇 I am uncertain if compression would help a lot with the backup/restore speed, it may be 30% faster but would still likely take hours. However, it would help quite a bit with disk space.
You can also do It's also possible to drop old room history from the database, for rooms not originating on one's server. The assumption is that synapse will fetch it again from distant servers if needed. But we stray from the topic :) |
30% of a couple of hours is already a big deal ! |
rather than compressing the database dump, why not compress the database itself ?
https://wiki.chatons.org/doku.php/services/messagerie_instantanee/matrix#nettoyage_d_un_serveur_synapse The other advantage would be that the synapse_ynh package could kind of "guarantee" that the messages and media retention policy is actually applied by actually cleaning, purging and freeing space in the DB |
Wouldn't that hurt performances ? |
maybe the most easiest way would be to just put the executable in the installation path at installation, and upgrade this tool at the same time of synapse if a new version is released? |
all people having issues with huge database should check this #478 |
Describe the bug
My synapse database is big. Database dumps can take more than 100 GB. My latest one (single user server) is at 32.8 GB.
Writing and reading that file to disk takes a long time, not to mention the wasted disk space.
Backups used to be compressed, but back then they were first tar-ed, then compressed. Both stages took a while.
Suggested solution
Pipe the postgresql dump to a (fast, multithread) compressor:
synapse_ynh/scripts/backup
Line 79 in c80e8fc
Change this to
(Ideally this would combine nice and ionice, like that:)
On the restore side:
synapse_ynh/scripts/restore
Line 121 in c80e8fc
Hmm, not straightforward here. Either make a fifo with mkfifo and pass this as the path, or change the helper/introduce a helper without the redirection there:
https://github.com/YunoHost/yunohost/blob/4b9e26b974b0cc8f7aa44fd773537508316b8ba6/helpers/postgresql#L78-L79
Expected gains
zstd level 3 gets an old dump from 17GB to 3.8GB. Level 7 only gets this down to 3.5GB. Level 1 (minimum, fastest) reaches 4.2 GB.
Both archive creation and dump time should be faster as less data needs to be written to disk, especially for hard disks.
Sample runs
These runs were collected with some I/O in the background (synapse restoration in progress).
This suggests that level 3 is probably good enough, and compressing barely adds any time to the operation, at least on my (powerful, relatively slow disk for the backup partition) machine.
The text was updated successfully, but these errors were encountered: