Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: Offline deduplication #206

Open
DemiMarie opened this issue Apr 3, 2022 · 9 comments
Open

RFE: Offline deduplication #206

DemiMarie opened this issue Apr 3, 2022 · 9 comments

Comments

@DemiMarie
Copy link

There are cases where it would be quite useful to be able to compact a thin pool by deduplicating identical blocks while the system is offline.

@jthornber
Copy link
Owner

jthornber commented Apr 3, 2022 via email

@DemiMarie
Copy link
Author

Could that be used to make a thin_dedup tool?

@brendanhoar
Copy link

Just for reference, @tasket's wyng backup might be an interesting project to take a glance at (not equivalent but maybe some overlap):

https://github.com/tasket/wyng-backup

B

@tasket
Copy link

tasket commented Apr 4, 2022

I think Demi is looking for a tool to deduplicate thin volumes in-place. From comments I've read in Linux discussion (and here?) I gathered that this would not be on the thinp roadmap.

OTOH, it seems like a narrowly-targeted form of dedup could be approximated for two target volumes by scanning for differences, snapshotting one volume, then updating it with the mapped differences (and finally replacing the snapshotted original with the snapshot).

@tasket
Copy link

tasket commented Apr 4, 2022

FWIW, Wyng can facilitate this as part of a restore from an archive (using a sparse write mode to update an existing volume, it will skip over chunks that match). But that means performing a backup first.

@DemiMarie
Copy link
Author

I think Demi is looking for a tool to deduplicate thin volumes in-place.

That’s correct. My goal is to be able to reclaim shared space on a Qubes OS system.

@brendanhoar
Copy link

A particular use case for Qubes: when backing up and then restoring thin LVs that were snapshots (e.g. of cloned QubesOS VMs), using most methods, one usually ends up with much more space used up after the restore than before, because while the originally pool had much more sharing of blocks, after all of the LVs are restored to a new thin pool, no blocks are being shared.

B

@DemiMarie
Copy link
Author

A particular use case for Qubes: when backing up and then restoring thin LVs that were snapshots (e.g. of cloned QubesOS VMs), using most methods, one usually ends up with much more space used up after the restore than before, because while the originally pool had much more sharing of blocks, after all of the LVs are restored to a new thin pool, no blocks are being shared.

B

This can actually be disastrous, as it can make backups impossible to restore. Deduplication during restore is necessary to prevent this problem.

@jthornber
Copy link
Owner

jthornber commented Apr 26, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants