Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORS-3741: [Nutanix] allow multi-subnets in Machine providerSpec and failureDomain configuration #2077

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yanhua121
Copy link
Contributor

@yanhua121 yanhua121 commented Oct 28, 2024

CORS-3741
Nutanix: allow multi-subnets in Machine providerSpec and failureDomain configuration

Copy link
Contributor

openshift-ci bot commented Oct 28, 2024

Hello @yanhua121! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 28, 2024
@yanhua121 yanhua121 force-pushed the nutanix-multi-subnets branch 2 times, most recently from 371b5d3 to 66fbeb9 Compare October 29, 2024 17:20
@yanhua121
Copy link
Contributor Author

/retest

@yanhua121
Copy link
Contributor Author

/assign @JoelSpeed

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should be introduced behind a feature gate, please take a look at our docs, https://github.com/openshift/enhancements/blob/master/dev-guide/featuresets.md

config/v1/types_infrastructure.go Show resolved Hide resolved
config/v1/types_infrastructure.go Show resolved Hide resolved
@yanhua121 yanhua121 changed the title Nutanix: allow multi-subnets in Machine providerSpec and failureDomain configuration CORS-3741: [Nutanix] allow multi-subnets in Machine providerSpec and failureDomain configuration Oct 30, 2024
@openshift-ci-robot
Copy link

@yanhua121: An error was encountered searching for bug CORS-3741 on the Jira server at https://issues.redhat.com/. No known errors were detected, please see the full error message for details.

Full error message. No response returned: Get "https://issues.redhat.com/rest/api/2/issue/CORS-3741": GET https://issues.redhat.com/rest/api/2/issue/CORS-3741 giving up after 5 attempt(s)

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

In response to this:

Nutanix: allow multi-subnets in Machine providerSpec and failureDomain configuration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

@yanhua121: An error was encountered searching for bug CORS-3741 on the Jira server at https://issues.redhat.com/. No known errors were detected, please see the full error message for details.

Full error message. No response returned: Get "https://issues.redhat.com/rest/api/2/issue/CORS-3741": GET https://issues.redhat.com/rest/api/2/issue/CORS-3741 giving up after 5 attempt(s)

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

In response to this:

CORS-3741
Nutanix: allow multi-subnets in Machine providerSpec and failureDomain configuration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Oct 30, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: yanhua121
Once this PR has been reviewed and has the lgtm label, please ask for approval from joelspeed. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

// +listMapKey=type
// +kubebuilder:validation:MaxItems=32
// +listType=atomic
// +kubeubilder:validation:XValidation:rule="self.all(x, self.exists_one(y, x == y))",message="each subnet must be unique"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in this one sorry

Suggested change
// +kubeubilder:validation:XValidation:rule="self.all(x, self.exists_one(y, x == y))",message="each subnet must be unique"
// +kubebuilder:validation:XValidation:rule="self.all(x, self.exists_one(y, x == y))",message="each subnet must be unique"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor Author

@yanhua121 yanhua121 Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, adding the suggested validation rule failed the ci/prow/verify-crd-schema test. When I tried to apply the "infrastructure" crd to my local OCP cluster, I got the same error:

The CustomResourceDefinition "infrastructures.config.openshift.io" is invalid:
spec.validation.openAPIV3Schema.properties[spec].properties[platformSpec].properties[nutanix].properties[failureDomains].items.properties[subnets].x-kubernetes-validations[0].rule: Forbidden: estimated rule cost exceeds budget by factor of more than 100x (try simplifying the rule, or adding maxItems, maxProperties, and maxLength where arrays, maps, and strings are declared) ......

So I removed this validation rule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelSpeed Can you take another look?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because the parents slice, FailureDomains within the NutanixPlatformSpec does not have a maxItems. Do you know how many possible values we might expect for a list of FailureDomains? We could potentially impose a limit retrospectively thanks to ratcheting validation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the below extra rule, the ci/prow/verify-crd-schema and all the e2e test runs, and my local test failed with the above mentioned error: "infrastructures.config.openshift.io" is invalid
Without this rule (the current code change), all the test passed, as well as my local test.

// +kubebuilder:validation:XValidation:rule="self.all(x, self.exists_one(y, x == y))",message="each subnet must be unique"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelSpeed Can you see if the PR is approvable? I hope it can merge today so as to unblock the other 3 dependent PRs review process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NutanixMachineProviderConfig can optionally refer to one FailureDomain by name. The FailureDomain configuration is in the Infrastructure CR.

And this field is being added to the infrastructure CR, as I mentioned, you need to limit the field FailureDomains slice that this field exists within, so that you can add the rule back

@JoelSpeed Can you see if the PR is approvable? I hope it can merge today so as to unblock the other 3 dependent PRs review process.

I previously requested this feature to be put behind a feature gate, I don't know if you saw that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelSpeed Done the changes suggested, and the test ci/prow/verify-crd-schema passed. Please take another look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the feature gate as required.

@yanhua121 yanhua121 force-pushed the nutanix-multi-subnets branch 4 times, most recently from f5ec960 to 8e91447 Compare October 31, 2024 18:14
Copy link
Contributor

openshift-ci bot commented Oct 31, 2024

@yanhua121: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@JoelSpeed
Copy link
Contributor

New feature must now be implemented behind featuregates, especially this close to branching.

Also, validation ratcheting is a new feature, so we will need to test the ratcheting feature for the new failure domains limit, for now, lets see if we can ask QE to create a cluster, add 257 failure domains, and then upgrade to this new schema and observe what happens

@openshift-ci openshift-ci bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 1, 2024
@@ -1741,6 +1741,7 @@ type NutanixPlatformSpec struct {
// prism element clusters to improve fault tolerance of the cluster.
// +listType=map
// +listMapKey=name
// +openshift:validation:FeatureGateAwareMaxItems:featureGate=NutanixMultiSubnets,maxItems=32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend for this to be 32? Previously we discussed 256, I'm happy with 32 if you are though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants