Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unmanaged datasets destroyed at boot time after import of zpool.cache #103

Closed
almereyda opened this issue Apr 30, 2020 · 18 comments
Closed

Comments

@almereyda
Copy link

Describe the bug

When custom datasets are present in the system, they are not recognised by zsys and sceduled for destuction. apt actions, the zsys-commit and zsys-gc journals report:

WARNING Couldn't find any association for user dataset epool/USERDATA/edata

zsys-gc further informs about impossible destruction of newly created datasets, as they are in use:

WARNING Couldn't destroy unmanaged user dataset epool/USERDATA/edata: couldn't destroy "epool/USERDATA/edata" and its children: stop destroying dataset on "epool/USERDATA/edata", cannot destroy child: cannot destroy dataset "epool/USERDATA/edata/pub": dataset is busy

After the next reboot, the dataset will not yet be mounted, and can be destroyed:

# zpool history -il epool
2020-04-28.23:20:34 [txg:16285] open pool version 5000; software version unknown; uts acacia 5.4.0-26-generic #30-Ubuntu SMP Mon Apr 20 16:58:30 UTC 2020 x86_64 [on acacia]
2020-04-28.23:20:35 [txg:16287] import pool version 5000; software version unknown; uts acacia 5.4.0-26-generic #30-Ubuntu SMP Mon Apr 20 16:58:30 UTC 2020 x86_64 [on acacia]
2020-04-28.23:20:40 zpool import -c /etc/zfs/zpool.cache -aN [user 0 (root) on acacia:linux]
2020-04-28.23:30:44 [txg:16409] destroy epool/USERDATA/edata/pub (600)  [on acacia]
...
2020-04-28.23:31:35 [txg:16451] destroy epool/USERDATA/edata/home (702)  [on acacia]
2020-04-28.23:31:38 [txg:16453] destroy epool/USERDATA/edata/data (607)  [on acacia]
2020-04-28.23:31:41 [txg:16455] destroy epool/USERDATA/edata/src (902)  [on acacia]
2020-04-28.23:31:43 [txg:16457] destroy epool/USERDATA/edata (70)  [on acacia]

To Reproduce
Steps to reproduce the behavior:

  1. Create a fresh install of Ubuntu 20.04 using the experimental ZFS support.
  2. Create an encrypted pool epool and recursively other datasets below, like epool/USERDATA, epool/USERDATA/pub etc.
  3. Transfer several TBs into your new pool.
  4. Reboot the system and have the pool's data destroyed automatically.

Expected behavior

The unmanaged datasets are left intact.

ubuntu user /tmp/report
https://gist.github.com/almereyda/8c2af14ba9de5c26d5309d75ae654f40

Installed versions:

  • OS: Ubuntu 20.04
  • Zsysd running version: 0.4.5

Additional context
Early information about the case were documented in https://askubuntu.com/questions/1232637

@didrocks
Copy link
Member

didrocks commented Apr 30, 2020

Thanks for the bug report!

The only place on the zpool directory where people shouldn’t create manually dataset is <pool>/USERDATA/. This is where zsys has to collect (garbage collect) to free up space any datasets unassociated to any system state manually. We need to do that in that particular place because there are a lot of interactions between multiple machines where we end up with datasets there that are up for descrution. This is what is covered by the hundreds of tests we have in ZSys to ensure this (and system states, but those have to be clones of other datasets associated to root).

The rest of the pool is yours (rpool/edata for instance or rpool/home will never destroyed and kept unmanaged by ZSys). I agree that ZSys warning log is wrong though and it shouldn’t warn them as unmanaged to on the tobedeleted list.

A set of blog post is going to be published out from basic user to advanced ones on how ZSys interacts with the system. I will be sure to integrate it into it, and maybe even on the FAQ (if you are interested to draft a page, I will happily merge it in the repo).

EDIT: Another way to see it is to ignore any datasets not being a clone of any users of any states in any machines detected by ZSys, the issue with that approach is that it would mean that people who are going to reinstall their machines without resetting their pool and who don’t know how ZFS is working will never have an opportunity to get this additional space clean, but this would be doable. We need to think about the impact of this a decision.

@almereyda
Copy link
Author

almereyda commented Apr 30, 2020

Thank you for the quick reaction.

Please excuse if this is a naive perspective, but given the assumptions that

  • using a USERDATA dataset is a convention and good practice on ZFS and
  • the custom pool epool is none of the system pools bpool nor rpool

I do not expect zsys having any authority to silently (!) destroy any data. In this case many TBs.

We are also using the USERDATA convention in deployment operations integrated with the paths that libre.sh expects, and this way all user domains would have been destroyed without warning.

This regression might as well explain why ZFS support is labelled experimental in the latest LTS, but under no circumstances it can delete data when following good practice and sane defaults without asking. The decision to maintain USERDATA datasets under rpool/ might still seem relevant for the zsys use case, but that should not automatically extend to manually created pools following other naming schemes.

Also a note would be helpful, how the datasets can be recovered, if there is any chance.

@almereyda
Copy link
Author

almereyda commented Apr 30, 2020

After some more research, I am looking at

This is the part where simple changes could remove unmanaged datasets from garbage collection:

allDatasets := make([]*zfs.Dataset, 0, len(ms.allSystemDatasets)+len(ms.allPersistentDatasets)+len(ms.allUsersDatasets)+len(ms.unmanagedDatasets))
byOrigin := make(map[string][]string) // list of clones for a given origin (snapshot)
snapshotsByDS := make(map[string][]string) // List of snapshots for a given dataset
log.Debug(ctx, i18n.G("Collect datasets"))
allDatasets = append(allDatasets, ms.allSystemDatasets...)
allDatasets = append(allDatasets, ms.allPersistentDatasets...)
allDatasets = append(allDatasets, ms.allUsersDatasets...)
allDatasets = append(allDatasets, ms.unmanagedDatasets...)

as the name already implies: an unmanaged dataset should not be managed.

To cite the article:

The actual facility is called with the command zsysctl service gc -a, which performs garbage collection on all zsys namespaces present on the system.
...
As Ubuntu developer Didier Roche explained on Twitter, Ubuntu 20.04 systems don't automatically run zsysctl service gc yet. Roche strongly recommends that nobody run the command on any production systems for the time being, since it's a data-destroying feature that's still in alpha.

The IRC logs linked in the thread also mention

Saviq | didrocks: and what does it consider "garbage"?
didrocks | automated snapshots (not taken by you manually) or manual clones, with a decay garbage rules
...
didrocks | I guess you start to underestand why we didn’t enable it by default yet (but we will for 20.04)

Combining the above together, it becomes even harder for me to understand, (1) why my personal datasets are considered to be snapshots in the first place, as they did not have any, and (2) why this has been enabled for the LTS release. It seems my very own data is considered as a manual clone, following the naming above.

  • Maybe garbage collection should only handle datasets that carry the _UUID particle in their name hierarchy?

Fortunately the SystemD unit does not run with the -a flag.

ExecStart=/sbin/zsysctl service gc

  • Maybe it would be a good compromise to remove the all-encompassing policy to include unmanaged datasets in garbage collection, if the all option is not consciously set?

Similar to what we find in

// If all is set manual snapshots are considered too

@didrocks
Copy link
Member

didrocks commented May 1, 2020

In summary (sorry for the long answer, but I prefer to explain the different causes/consequences which also makes me help thinking about this issue): this needs a little bit of thoughts. Sorry for your experience of unexpected dataset deletion but I think we’ll find a way around it.

  • using a USERDATA dataset is a convention and good practice on ZFS and

Do you have reference to this? USERDATA has never been a convention on ZFS. ZSys came with this naming for subspacing while drafting the specificiation in April 2019. This name was discussed with ZFS upstream (who is making the ZFS on root ubuntu guide) and this is where we set it. So no, this is not an existing convention.

the custom pool epool is none of the system pools bpool nor rpool
Indeed, BUT, we had requets from advanced users, (as you are as well) to have user datasets capabilities to be set on any pool. We thus have an heuristic to find and merge all user datasets in any available pools:

  • user A can be on rpool and local
  • user B can be shared with machine C and part of any other pool.
    Same for user creation, we prefer existing USERDATA if any (first local pool, then other pools). Finally, if no USERDATA is present, we look at the local pool.

So, even if a"don’t touch outside of rpool" seems trivial at first, the reality is way different for the above reasons as per advanced user requests.

This is for all those reasons that we have seen until now USERDATA as a reserved namespace. It seems we were wrong (I would still welcome documentation about this convention and how we can matching the multi-years known convention by chance).

The decision to maintain USERDATA datasets under rpool/ might still seem relevant for the zsys use case, but that should not automatically extend to manually created pools following other naming schemes.

You suggestion about limiting to rpool doesn’t work for the reasons I gave about multi-machines and multi-poools users case. And as you have fallen into this issue, I will bet that the next user report bug will be about "why did you remove my rpool/USERDATA/awesomedata?". So, that won’t fix the issue :)

Combining the above together, it becomes even harder for me to understand, (1) why my personal datasets are considered to be snapshots in the first place, as they did not have any, and (2) why this has been enabled for the LTS release. It seems my very own data is considered as a manual clone, following the naming above.

We have to track unmanaged datasets in this line of code to look for any dependencies which may prevent deleting one state (datasets clones and such). In the end of the GC, we look for any untagged (because some actions can’t be destroyed immediatelly, but only untagged) filesystem datasets and remove them one after another to "clean" the system.

Maybe garbage collection should only handle datasets that carry the _UUID particle in their name hierarchy?
I will forsee the same issue: something creating rpool/USERDATA/awesome_data and getting their filesystem datasets destroyed

Maybe it would be a good compromise to remove the all-encompassing policy to include unmanaged datasets in garbage collection, if the all option is not consciously set?

I think we should just leave alone all filesystem datasets, which is its own space: this means that it’s not a clone and none of its clone/snapshots (or snapshots on clones) is attached to any system state for any machines.

We loose the GC capabilities for other machines this way, but this is a reasonable tradeoff.
There is the issue with user deletion then: If a user is deleted, we can’t immediately delete all snapshots and clones for this users (because some system revert need to restore the user). As long as they have one state attached to a system state, this is fine. However, when we unattach the last user state from a system state (because of GC), it will end up under this unmanaged dataset definition, but we don’t want to keep it forever.

This needs a little bit of thoughts. Sorry again for your experience of unexpected dataset deletion. Fortunately, we have a log of tests which covers various cases in this area (https://github.com/ubuntu/zsys/blob/master/internal/machines/machines_test.go#L1120, https://github.com/ubuntu/zsys/blob/master/internal/machines/machines_test.go#L939, https://github.com/ubuntu/zsys/blob/master/internal/machines/internal_test.go#L420, https://github.com/ubuntu/zsys/blob/master/internal/machines/internal_test.go#L509) and you can see that there is a TODO that we need to change: https://github.com/ubuntu/zsys/blob/master/internal/machines/machines_test.go#L1192 to say "detached user datasets should not be collected".

I'll open another bug to track user deletion strategy with this new approach.

What do you think?

@didrocks
Copy link
Member

didrocks commented May 1, 2020

Some notes after looking a little bit in:

  • the unmanaged warning is a leftover of the previous warning, we are indeed looping over AllUserDatasets and not UnmanagedDatasets, so this is correct:

for _, d := range ms.allUsersDatasets {
if d.IsSnapshot {
continue
}
if d.BootfsDatasets != "" {
continue
}
for rootDataset := range userDatasetsToKeep {
if d.Name == rootDataset || strings.HasPrefix(d.Name, rootDataset+"/") {
continue nextDataset
}
}
for _, n := range alreadyDestroyedRoot {
if strings.HasPrefix(d.Name, n+"/") {
continue nextDataset
}
}
hasChanges = true
// We destroy here all snapshots and leaf attached. Snapshots won’t be taken into account, however, we don’t want
// to try destroying leaves again, keep a list.
if err := nt.Destroy(d.Name); err != nil {
log.Warningf(ctx, i18n.G("Couldn't destroy unmanaged user dataset %s: %v"), d.Name, err)
}

  • We thus need to ensure that kind of datasets don’t ends up in this AllUserDatasets when loading it and migrate to UnmanagedDatasets list.

This for filesystem datasets:

if !associateWithAtLeastOne {
log.Warningf(ctx, i18n.G("Couldn't find any association for user dataset %s"), r.Name)
}

This for snapshot datasets:

if !associateWithAtLeastOne {
log.Warningf(ctx, i18n.G("Couldn't find any association for user dataset %s"), r.Name)
}

We can append them to a list and skip over them when inflating the AllUserDatasets list so that they end up in the unmanaged one:

for _, d := range flattenedUserDatas {
if d.CanMount == "off" {
continue
}
machines.allUsersDatasets = append(machines.allUsersDatasets, d)
}

EDIT: this is slightly more complex and need a second pass because we can have find a snapshot (same name) or a file system dataset clone associated with one part of the system, and so, all snapshots and file system datasets become then "managed" so that the user can see the whole history (this is typically the case of a deleted user).

The question is about the warning "Couldn't find any association for user dataset". I feel it should still be there (this is proceeded when refreshing the internal representation machine), maybe the wording can be slightely changed? Should we downgrade it to INFO (as we are starting to expect it as being a valid use case)? Any opinions are welcome!

@almereyda
Copy link
Author

Merci Didier! I will have to think of this and get back to you with the reference to the USERDATA convention - on our machines in use since Oct. 2017 the latest, and try to understand the involved mechanics without knowing the code. Please allow me to meditate about that.

Today is international workers day ✊ so thanks for your work until here indeed!

@didrocks
Copy link
Member

didrocks commented May 1, 2020

No worry! Thanks for raising this up :)

Keep me posted once you have put some thoughts on it. Just a disclaimer that in the following days, I may not be available as much as I want due to personal reasons, but I’ll get back to you :)

@almereyda
Copy link
Author

almereyda commented May 26, 2020 via email

@didrocks
Copy link
Member

Yeah :) And we found a way to deal with deleted users still (#131), so next release will fix this issue without regressing the other use case :)

@ianmccul
Copy link

It seems that many people are running into this problem and losing data. For newcomers to zfs, it is a fairly obvious thing to follow the existing naming convention that zfs list shows, and name datasets in some additional storage pool as pool/USERDATA/user. If Zsys wants to reserve a name for itself then it should be something really obvious that it is private for Zsys and users shouldn't name their datasets like that. 'USERDATA' is a terrible choice. 'ZSYS-MANAGED', or something like that, would be better.

@malachid
Copy link

malachid commented Jul 7, 2020

I agree with @ianmccul . I also used rpool/USERDATA/user as that seemed like the logical place on a fresh install.

@Lockszmith-GH
Copy link

Lockszmith-GH commented Jul 10, 2020

Sorry to butt my nose here, and if this has been discussed elsewhere and decided against, please feel free to mark as off-topic.

Wouldn't a ZFS custom option/property of zsys:managed_userdata_dataset=true solve the issue?
Using zfs get -t filesystem,volume zsys:managed_userdata_dataset to identify whether a dataset is relevant or not?
similar to how zfs_autobackup manages it's options.

@didrocks
Copy link
Member

Isn’t that fixed by zsys 0.4.6? We are now expecting in addition to be in USERDATA/ that the zsys bootfs-dataset property is set to collect those. Do you have experimented it with a newer version?

@h-2
Copy link

h-2 commented May 14, 2021

Oh no, the same thing just deleted my home dataset with a week of work on it 😱 Is there any way I can get it back? This is really bad default behaviour.

I followed: https://talldanestale.dk/2020/04/06/zfs-and-homedir-encryption/ and I manually set zsys bootfs-dataset property on the dataset so that it wouldn't be deleted. It worked fine for a week, but today I mistyped the password three times and after another reboot the dataset was gone completely. Why, oh why? 😭

@loxK
Copy link

loxK commented Mar 4, 2022

On Ubuntu 21.10, zsys just destroyed some datasets on a zpool on a secondary hard drive, was lpool/USERDATA/user/www and lpool/USERDATA/user/dev I lost a week of hard work.

@didrocks how can this be possible ?

@h-2
Copy link

h-2 commented Mar 7, 2022

@didrocks Please reopen this issue. Your services are deleting the data of your users.

@didrocks
Copy link
Member

didrocks commented Mar 7, 2022

This discussion has moved since #196 as this has nothing to do with this bug. However, please be aware about #213

@a0c
Copy link

a0c commented Apr 10, 2023

This might be helpful in case of issues with backups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants