-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFE: Offline deduplication #206
Comments
Not quite the same thing, but look at the dm-archive tool I'm currently
working on.
…On Sun, 3 Apr 2022, 20:34 Demi Marie Obenour, ***@***.***> wrote:
There are cases where it would be quite useful to be able to compact a
thin pool by deduplicating identical blocks while the system is offline.
—
Reply to this email directly, view it on GitHub
<#206>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABOSQ5YVF3R3RT4G3KLNY3VDHXF3ANCNFSM5SNTWERQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Could that be used to make a thin_dedup tool? |
Just for reference, @tasket's wyng backup might be an interesting project to take a glance at (not equivalent but maybe some overlap): https://github.com/tasket/wyng-backup B |
I think Demi is looking for a tool to deduplicate thin volumes in-place. From comments I've read in Linux discussion (and here?) I gathered that this would not be on the thinp roadmap. OTOH, it seems like a narrowly-targeted form of dedup could be approximated for two target volumes by scanning for differences, snapshotting one volume, then updating it with the mapped differences (and finally replacing the snapshotted original with the snapshot). |
FWIW, Wyng can facilitate this as part of a restore from an archive (using a sparse write mode to update an existing volume, it will skip over chunks that match). But that means performing a backup first. |
That’s correct. My goal is to be able to reclaim shared space on a Qubes OS system. |
A particular use case for Qubes: when backing up and then restoring thin LVs that were snapshots (e.g. of cloned QubesOS VMs), using most methods, one usually ends up with much more space used up after the restore than before, because while the originally pool had much more sharing of blocks, after all of the LVs are restored to a new thin pool, no blocks are being shared. B |
This can actually be disastrous, as it can make backups impossible to restore. Deduplication during restore is necessary to prevent this problem. |
dm-archive (which I'm going to rename to blk-archive) will check to see if
it's restoring to a thin device. If it is, it will read the mappings *and*
read the data, it will then do minimal writes to restore the backup. This
is a flexible approach because it allows us to regain sharing between any
two related thin devices. eg, the backup might be taken a month ago, and
restored to a snapshot of the current head.
…On Mon, 25 Apr 2022 at 22:28, Demi Marie Obenour ***@***.***> wrote:
A particular use case for Qubes: when backing up and then restoring thin
LVs that were snapshots (e.g. of cloned QubesOS VMs), using most methods,
one usually ends up with much more space used up after the restore than
before, because while the originally pool had much more sharing of blocks,
after all of the LVs are restored to a new thin pool, no blocks are being
shared.
B
This can actually be disastrous, as it can make backups impossible to
restore. Deduplication during restore is necessary to prevent this problem.
—
Reply to this email directly, view it on GitHub
<#206 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABOSQ4LX2YCRZXAJIDHAJLVG4E57ANCNFSM5SNTWERQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
There are cases where it would be quite useful to be able to compact a thin pool by deduplicating identical blocks while the system is offline.
The text was updated successfully, but these errors were encountered: