Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for --replace-mode=alongside for ostree target #137

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cgwalters
Copy link
Collaborator

Ironically our support for --replace-mode=alongside breaks when we're targeting an already extant ostree host, because when we first blow away the /boot directory, this means the ostree stack loses its knowledge that we're in a booted deployment, and will attempt to GC it...

ostreedev/ostree-rs-ext@8fa019b is a key part of the fix for that.

However, a notable improvement we can do here is to grow this whole thing into a real "factory reset" mode, and this will be a compelling answer to
coreos/fedora-coreos-tracker#399

To implement this though we need to support configuring the stateroot and not just hardcode default.

@openshift-ci
Copy link

openshift-ci bot commented Oct 2, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@omertuc
Copy link
Contributor

omertuc commented Sep 18, 2024

Sorry @cgwalters , accidentally pushed the rebase to your fork instead of mine

EDIT: undid it, continuing my rebase efforts on https://github.com/omertuc/bootc/tree/137clone

EDIT2: Continuing here

@cgwalters
Copy link
Collaborator Author

I think you can just take over this PR too if you want, or open a new PR from your fork - either way.

@omertuc
Copy link
Contributor

omertuc commented Sep 19, 2024

Rebased. Without any changes, I'm facing an issue where in an ostree system, the mounted / on the host system is an overlay (-v mounted into /:/target) and so the findmnt source for it is overlay rather than a /dev/... and so it trips up lsblk later on

I'll see how I can tweak it so that it finds the right device

@omertuc
Copy link
Contributor

omertuc commented Oct 2, 2024

With additional -v /sysroot:/target -v /sysroot:/target/sysroot mounts instead of -v /:/target and --stateroot foo, this seems to work

@omertuc
Copy link
Contributor

omertuc commented Oct 2, 2024

@cgwalters thoughts on the above mounts? Do we want to require them for install on ostree targets, or should I figure out a way to make this work without them, using just the already-documented install mounts (i.e. /:/target)?

@cgwalters
Copy link
Collaborator Author

and so the findmnt source for it is overlay rather than a /dev/... and so it trips up lsblk later on

We should learn how to peel that. This is really the same thing as https://bugzilla.redhat.com/show_bug.cgi?id=2308594 and ostreedev/ostree#3198 and containers/composefs#280

Short term the simplest is the same logic as the grub patch - detect overlayfs for / and check if /sysroot exists and is mounted, if so use that.

@omertuc
Copy link
Contributor

omertuc commented Oct 8, 2024

and so the findmnt source for it is overlay rather than a /dev/... and so it trips up lsblk later on

We should learn how to peel that. This is really the same thing as bugzilla.redhat.com/show_bug.cgi?id=2308594 and ostreedev/ostree#3198 and containers/composefs#280

Short term the simplest is the same logic as the grub patch - detect overlayfs for / and check if /sysroot exists and is mounted, if so use that.

OK. Changed it so that when the target rootfs is an overlay, we'll implicitly try targetting <original_target>/sysroot instead.

It wasn't working at first and was a bit of a headache for me to debug because apparently if you mount /:/target then inside the container /target/sysroot is read-only by default, and so ensure_dir_labeled was failing, as opposed to when you mount /sysroot:/target directly, in which case it's not read-only. Took me a while to track that down chasing red herrings, and I'm still not sure who's responsible for this behavior (kernel? podman?), but after I realized it I simply moved ensure_dir_label to run only after your added let _ = crate::utils::open_dir_remount_rw... and then the rest just worked.

Current code might need a bit of touch-ups, but do you think the direction of the code in its current state is good? Should I clean it up and undraft?

@cgwalters
Copy link
Collaborator Author

It wasn't working at first and was a bit of a headache for me to debug because apparently if you mount /:/target then inside the container /target/sysroot is read-only by default, and so ensure_dir_labeled was failing, as opposed to when you mount /sysroot:/target directly, in which case it's not read-only.

I think that's possibly because it's bootc that's special casing mounting /sysroot read-write - that's how we do it outside of a container at least.

Copy link
Collaborator Author

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up!!

lib/src/install.rs Show resolved Hide resolved
lib/src/install.rs Outdated Show resolved Hide resolved
lib/src/lsm.rs Outdated Show resolved Hide resolved
lib/src/utils.rs Outdated Show resolved Hide resolved
@omertuc omertuc force-pushed the install-existing-ostree branch 4 times, most recently from 3561a64 to ed94f1e Compare October 14, 2024 17:35
@omertuc omertuc force-pushed the install-existing-ostree branch 3 times, most recently from d392280 to eed91ff Compare October 21, 2024 15:05
@omertuc
Copy link
Contributor

omertuc commented Oct 22, 2024

The current experience is:

echo "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOTVytyhnSfX20smAsNKYG5Zpz6vSzDZu22S8PCDJ2Iw omer" > authkeys

sudo podman run --rm --privileged -v $PWD/authkeys:/authkeys -v /dev:/dev -v /var/lib/containers:/var/lib/containers -v /:/target --pid=host --security-opt label=type:unconfined_t -e RUST_LOG=trace quay.io/otuchfel/bootc:latest bootc install to-existing-root --replace alongside --acknowledge-destructive --stateroot foobar --root-ssh-authorized-keys /authkeys

Following that, when I reset, it immediately boots into the new foobar stateroot, there are no grub boot entries for the original one:

image

I assume this is not our desired experience, and I should look into having the original boot entry preserved?

@cgwalters
Copy link
Collaborator Author

It's a great question. I think for install --replace=alongside, that is indeed the expectation by default.

Let's then target merging this as is?

I think then what we should look at is weaving in this functionality into #404 right?

That said we may end up wanting to expose this as something like bootc install --retain or something? At least in the ostree case it's trivial to do, we just don't blow away /boot 😄

The super messy thing is interoperating with non-ostree-ready bootloader setups so --retain would probably have to just error out in that case for now.

IOW:

  • Let's merge as is
  • Look at --retain as a followup
    ?

@omertuc omertuc marked this pull request as ready for review October 22, 2024 16:56
@omertuc
Copy link
Contributor

omertuc commented Oct 22, 2024

sgtm

@cgwalters cgwalters changed the title WIP: Add support for --replace-mode=alongside for ostree target Add support for --replace-mode=alongside for ostree target Oct 22, 2024
@cgwalters
Copy link
Collaborator Author

OK so we should have CI coverage here...I think it's actually as easy as doing another install after an initial one without doing the manual "wipe ostree" stuff we have in one of the install tests right?

@omertuc omertuc force-pushed the install-existing-ostree branch 2 times, most recently from f465ae7 to 9876584 Compare October 23, 2024 09:23
@omertuc
Copy link
Contributor

omertuc commented Oct 23, 2024

OK so we should have CI coverage here...I think it's actually as easy as doing another install after an initial one without doing the manual "wipe ostree" stuff we have in one of the install tests right?

Not sure I agree, ideally we should actually boot into the installed ostree system first? But getting reboots to work with the current tests-integration/src/install.rs harness will be tricky

But anyway can't hurt having a test like you suggested, even without a reboot

@cgwalters
Copy link
Collaborator Author

Yes, testing with reboots is going to require some more infrastructure here.

@omertuc
Copy link
Contributor

omertuc commented Oct 23, 2024

Yes, testing with reboots is going to require some more infrastructure here.

I've added a test but as suspected, the test passes even on the main branch, so it's not very helpful for verifying this PR without a proper reboot. Should we keep it anyway?

scratch that I'm silly and forgot to remove the call to reset_root

Unscratch that even after removing the call to reset_root, the test still passes on the main branch 🫤

Ironically our support for `--replace-mode=alongside` breaks
when we're targeting an already extant ostree host, because when
we first blow away the `/boot` directory, this means the ostree
stack loses its knowledge that we're in a booted deployment,
and will attempt to GC it...

ostreedev/ostree-rs-ext@8fa019b
is a key part of the fix for that.

However, a notable improvement we can do here is to grow this
whole thing into a real "factory reset" mode, and this will
be a compelling answer to
coreos/fedora-coreos-tracker#399

To implement this though we need to support configuring the
stateroot and not just hardcode `default`.

Signed-off-by: Omer Tuchfeld <[email protected]>
@cgwalters
Copy link
Collaborator Author

We talked about this and realized that while the new test passes, it would pass already today because it's actually the "install --stateroot" that makes it work.

The main fix we need here is preserving existing deployments when we detect we're booted via ostree.

So actually a way we could test this is via our tmt tests instead.

But in the end again I'm good to merge as is. Since I wrote this PR I can't approve it, you (or someone else) needs to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/install Issues related to `bootc install`
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants