-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unmanaged datasets destroyed at boot time after import of zpool.cache #103
Comments
Thanks for the bug report! The only place on the zpool directory where people shouldn’t create manually dataset is The rest of the pool is yours (rpool/edata for instance or rpool/home will never destroyed and kept unmanaged by ZSys). I agree that ZSys warning log is wrong though and it shouldn’t warn them as unmanaged to on the tobedeleted list. A set of blog post is going to be published out from basic user to advanced ones on how ZSys interacts with the system. I will be sure to integrate it into it, and maybe even on the FAQ (if you are interested to draft a page, I will happily merge it in the repo). EDIT: Another way to see it is to ignore any datasets not being a clone of any users of any states in any machines detected by ZSys, the issue with that approach is that it would mean that people who are going to reinstall their machines without resetting their pool and who don’t know how ZFS is working will never have an opportunity to get this additional space clean, but this would be doable. We need to think about the impact of this a decision. |
Thank you for the quick reaction. Please excuse if this is a naive perspective, but given the assumptions that
I do not expect We are also using the This regression might as well explain why ZFS support is labelled experimental in the latest LTS, but under no circumstances it can delete data when following good practice and sane defaults without asking. The decision to maintain Also a note would be helpful, how the datasets can be recovered, if there is any chance. |
After some more research, I am looking at
This is the part where simple changes could remove unmanaged datasets from garbage collection: Lines 55 to 64 in e027ea8
as the name already implies: an unmanaged dataset should not be managed. To cite the article:
The IRC logs linked in the thread also mention
Combining the above together, it becomes even harder for me to understand, (1) why my personal datasets are considered to be snapshots in the first place, as they did not have any, and (2) why this has been enabled for the LTS release. It seems my very own data is considered as a manual clone, following the naming above.
Fortunately the SystemD unit does not run with the Line 9 in e027ea8
Similar to what we find in Line 48 in e027ea8
|
In summary (sorry for the long answer, but I prefer to explain the different causes/consequences which also makes me help thinking about this issue): this needs a little bit of thoughts. Sorry for your experience of unexpected dataset deletion but I think we’ll find a way around it.
Do you have reference to this? USERDATA has never been a convention on ZFS. ZSys came with this naming for subspacing while drafting the specificiation in April 2019. This name was discussed with ZFS upstream (who is making the ZFS on root ubuntu guide) and this is where we set it. So no, this is not an existing convention.
So, even if a"don’t touch outside of rpool" seems trivial at first, the reality is way different for the above reasons as per advanced user requests. This is for all those reasons that we have seen until now USERDATA as a reserved namespace. It seems we were wrong (I would still welcome documentation about this convention and how we can matching the multi-years known convention by chance).
You suggestion about limiting to rpool doesn’t work for the reasons I gave about multi-machines and multi-poools users case. And as you have fallen into this issue, I will bet that the next user report bug will be about "why did you remove my
We have to track unmanaged datasets in this line of code to look for any dependencies which may prevent deleting one state (datasets clones and such). In the end of the GC, we look for any untagged (because some actions can’t be destroyed immediatelly, but only untagged) filesystem datasets and remove them one after another to "clean" the system.
I think we should just leave alone all filesystem datasets, which is its own space: this means that it’s not a clone and none of its clone/snapshots (or snapshots on clones) is attached to any system state for any machines. We loose the GC capabilities for other machines this way, but this is a reasonable tradeoff. This needs a little bit of thoughts. Sorry again for your experience of unexpected dataset deletion. Fortunately, we have a log of tests which covers various cases in this area (https://github.com/ubuntu/zsys/blob/master/internal/machines/machines_test.go#L1120, https://github.com/ubuntu/zsys/blob/master/internal/machines/machines_test.go#L939, https://github.com/ubuntu/zsys/blob/master/internal/machines/internal_test.go#L420, https://github.com/ubuntu/zsys/blob/master/internal/machines/internal_test.go#L509) and you can see that there is a TODO that we need to change: https://github.com/ubuntu/zsys/blob/master/internal/machines/machines_test.go#L1192 to say "detached user datasets should not be collected". I'll open another bug to track user deletion strategy with this new approach. What do you think? |
Some notes after looking a little bit in:
Lines 421 to 444 in e027ea8
This for filesystem datasets: zsys/internal/machines/machines.go Lines 304 to 306 in e027ea8
This for snapshot datasets: zsys/internal/machines/machines.go Lines 304 to 306 in e027ea8
We can append them to a list and skip over them when inflating the AllUserDatasets list so that they end up in the unmanaged one: zsys/internal/machines/machines.go Lines 329 to 334 in e027ea8
EDIT: this is slightly more complex and need a second pass because we can have find a snapshot (same name) or a file system dataset clone associated with one part of the system, and so, all snapshots and file system datasets become then "managed" so that the user can see the whole history (this is typically the case of a deleted user). The question is about the warning "Couldn't find any association for user dataset". I feel it should still be there (this is proceeded when refreshing the internal representation machine), maybe the wording can be slightely changed? Should we downgrade it to INFO (as we are starting to expect it as being a valid use case)? Any opinions are welcome! |
Merci Didier! I will have to think of this and get back to you with the reference to the Today is international workers day ✊ so thanks for your work until here indeed! |
No worry! Thanks for raising this up :) Keep me posted once you have put some thoughts on it. Just a disclaimer that in the following days, I may not be available as much as I want due to personal reasons, but I’ll get back to you :) |
Thank you for adding these changes and inventing this mechanism. I am sure
it will help a great deal in avoiding accidental data loss in the future.
…On Fri, 22 May 2020 at 16:34, Didier Roche ***@***.***> wrote:
Closed #103 <#103> via 16b68b8
<16b68b8>
.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#103 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMRV7BWQZHZRT5NER7SVTDRS2ENPANCNFSM4MVVAKSA>
.
|
Yeah :) And we found a way to deal with deleted users still (#131), so next release will fix this issue without regressing the other use case :) |
It seems that many people are running into this problem and losing data. For newcomers to zfs, it is a fairly obvious thing to follow the existing naming convention that |
I agree with @ianmccul . I also used rpool/USERDATA/user as that seemed like the logical place on a fresh install. |
Sorry to butt my nose here, and if this has been discussed elsewhere and decided against, please feel free to mark as off-topic. Wouldn't a ZFS custom option/property of |
Isn’t that fixed by |
Oh no, the same thing just deleted my home dataset with a week of work on it 😱 Is there any way I can get it back? This is really bad default behaviour. I followed: https://talldanestale.dk/2020/04/06/zfs-and-homedir-encryption/ and I manually set |
On Ubuntu 21.10, zsys just destroyed some datasets on a zpool on a secondary hard drive, was lpool/USERDATA/user/www and lpool/USERDATA/user/dev I lost a week of hard work. @didrocks how can this be possible ? |
@didrocks Please reopen this issue. Your services are deleting the data of your users. |
This might be helpful in case of issues with backups. |
Describe the bug
When custom datasets are present in the system, they are not recognised by
zsys
and sceduled for destuction.apt
actions, thezsys-commit
andzsys-gc
journals report:zsys-gc
further informs about impossible destruction of newly created datasets, as they are in use:After the next reboot, the dataset will not yet be mounted, and can be destroyed:
To Reproduce
Steps to reproduce the behavior:
epool
and recursively other datasets below, likeepool/USERDATA
,epool/USERDATA/pub
etc.Expected behavior
The unmanaged datasets are left intact.
ubuntu user
/tmp/report
https://gist.github.com/almereyda/8c2af14ba9de5c26d5309d75ae654f40
Installed versions:
Additional context
Early information about the case were documented in https://askubuntu.com/questions/1232637
The text was updated successfully, but these errors were encountered: