Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding an enhancement for integrating day0 kmods with KMM. #957

Closed
wants to merge 5 commits into from

Conversation

ybettan
Copy link
Collaborator

@ybettan ybettan commented Jan 10, 2024

Copy link

openshift-ci bot commented Jan 10, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ybettan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

netlify bot commented Jan 10, 2024

Deploy Preview for openshift-kmm ready!

Name Link
🔨 Latest commit f666db4
🔍 Latest deploy log https://app.netlify.com/sites/openshift-kmm/deploys/65a5071b60494700086c5e2b
😎 Deploy Preview https://deploy-preview-957--openshift-kmm.netlify.app/
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@ybettan
Copy link
Collaborator Author

ybettan commented Jan 10, 2024

Here is a link describing the full process on how to build the first custom ISO and container image to build the cluster using assisted-installer: https://github.com/ybettan/image-composer/blob/main/CUSTOMIZE_RHCOS.md

@codecov-commenter
Copy link

codecov-commenter commented Jan 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (62a631c) 76.49% compared to head (51f3c33) 76.47%.
Report is 4 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #957      +/-   ##
==========================================
- Coverage   76.49%   76.47%   -0.02%     
==========================================
  Files          61       62       +1     
  Lines        5675     5684       +9     
==========================================
+ Hits         4341     4347       +6     
- Misses       1115     1117       +2     
- Partials      219      220       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ybettan ybettan force-pushed the day0 branch 3 times, most recently from 3b62ced to 80b41c1 Compare January 11, 2024 10:28
docs/enhancements/0002_day0-integration.md Outdated Show resolved Hide resolved
docs/enhancements/0002_day0-integration.md Outdated Show resolved Hide resolved
docs/enhancements/0002_day0-integration.md Outdated Show resolved Hide resolved
docs/enhancements/0002_day0-integration.md Outdated Show resolved Hide resolved
docs/enhancements/0002_day0-integration.md Outdated Show resolved Hide resolved
docs/enhancements/0002_day0-integration.md Outdated Show resolved Hide resolved
docs/enhancements/0002_day0-integration.md Outdated Show resolved Hide resolved
docs/enhancements/0002_day0-integration.md Outdated Show resolved Hide resolved
docs/enhancements/0002_day0-integration.md Outdated Show resolved Hide resolved
docs/enhancements/0002_day0-integration.md Outdated Show resolved Hide resolved
@yevgeny-shnaidman
Copy link
Member

General remark: I still think that KMM should not be the one responsible for updating the MachineConfig

@pcolledg-amd
Copy link
Contributor

pcolledg-amd commented Jan 11, 2024

I'd suggest the RHCOS version string (ie. 414.92.202312311229-0) in ModuleDay0 CR be an internally-resolved templated variable a la KMM+DTK's ${KERNEL_FULL_VERSION}. Or even calculated from ${KERNEL_FULL_VERSION}.

I don't know if the RHCOS version string exactly identifies the kernel version (which is what matters to kmods). For example, if realtime kernel support were completed, it would presumably have the same RHCOS version string due to the packages having been included in the same image. Of course, the ostree/container image could include kmods compiled for multiple kernel versions, and there is value in that, but that somewhat deviates from design assumptions inherent in KMM's use of ${KERNEL_FULL_VERSION}.

Presumably there is no way to label the autogenerated MachineConfig so users writing their MachineConfigPool CRs could use selectors with the same level of convenience.

@ybettan ybettan force-pushed the day0 branch 5 times, most recently from d0f742d to 51f3c33 Compare January 14, 2024 13:29
The `MachineConfigPool`s are used by MCO to tie between a MC and the
nodes it should be applied to.

KMM should only care about which MC he should act upon.

Signed-off-by: Yoni Bettan <[email protected]>
@ybettan
Copy link
Collaborator Author

ybettan commented Jan 14, 2024

General remark: I still think that KMM should not be the one responsible for updating the MachineConfig

@yevgeny-shnaidman If we put it in KMM then we have 2 options:

  1. We add a flag that only register the new controller if explicitly requested
    • Pros: we only need a single operator to manage both day2 and day0 kmods
    • Cons: operator becomes more complex
  2. We create a new kmm-day0 image (same as kmm-hub) for that controller
    • Pros: We keep the separation of concerns
    • Cons: If a user wishes to manage both day0 and day2 kmods he will need to install 2 operators

Do you still think it shouldn't be in KMM even if it is in a separate image?
If you think it doesn't belong here then let's schedule a meeting with @qbarrand, @mresvanis and @bthurber and decide where to put it.

`/var/lib/firmware` isn't in the default lookup path for firmware.

Signed-off-by: Yoni Bettan <[email protected]>
Signed-off-by: Yoni Bettan <[email protected]>
When a `MachineConfig` uses `osImageURL`, it is called
[image-layering](https://docs.openshift.com/container-platform/4.14/post_installation_configuration/coreos-layering.html).

When image-layering is used, the OCP layer is being "detached" from the underlying

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it still the case? I was under the assumption that it should be addressed in future releases

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too. I haven't checked its status recently. Will ask.

Copy link
Collaborator Author

@ybettan ybettan Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still the same behavior indeed.
Looks like the proposed-expriance may duplicate some effort planned in https://issues.redhat.com/browse/MCO-665.

@romfreiman Do you think we should avoid upgrading the nodes after a cluster upgrade (as proposed in this PR) due to the recent info about planned action in the MCO space?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i rather not have 2 mechanisms to achieve the same. And, it will introduce more reboots - 10m pare baremetal server.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I will drop this "improvement" from the proposal.
FYI @qbarrand

Copy link

openshift-ci bot commented Jan 15, 2024

@ybettan: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/check-commits-count f666db4 link true /test check-commits-count

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

* Removed the `osMapping` and the `literal` from the API.
* Described how the `ModuleDay0` is constructed by KMM.
* Added a build example as Yevgeni's request

Signed-off-by: Yoni Bettan <[email protected]>
@ybettan
Copy link
Collaborator Author

ybettan commented Jan 15, 2024

@pcolledg-amd Thank you for your review.

I'd suggest the RHCOS version string (ie. 414.92.202312311229-0) in ModuleDay0 CR be an internally-resolved templated variable a la KMM+DTK's ${KERNEL_FULL_VERSION}. Or even calculated from ${KERNEL_FULL_VERSION}.

Are you referring to the container image (quay.io/ybettan/rhcos:414.92.202312311229-0) or to the os literal ("Red Hat Enterprise Linux CoreOS 414.92.202312311229-0 (Plow)")?

In KMM, we use those templates in case a user is targeting multiple nodes with different kernel using a regexp, therefore we need to template the kernel image to adjust it to each node. In the new day0 flow, a machineConfig will target nodes with the exact same OS on them so I am not sure I get the point of templating it.

@ybettan
Copy link
Collaborator Author

ybettan commented Feb 8, 2024

After some discussions, we won't introduce an MCO-KMM integration for now, therefore, closing in favor of #1004

/close

Copy link

openshift-ci bot commented Feb 8, 2024

@ybettan: Closed this PR.

In response to this:

After some discussions, we won't introduce an MCO-KMM integration for now, therefore, closing in favor of #1004

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot closed this Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants