Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAPI Multi-tenancy contract proposal for CAPM3 #429

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lentzi90
Copy link
Member

@lentzi90 lentzi90 commented May 16, 2024

This adds a proposal for how to implement the CAPI multi-tenancy contract for CAPM3.

Multi-tenancy in this context is defined by the CAPI contract and simply means that CAPM3 is able to accept tenant credentials (as opposed to the current implied access to BareMetalHosts in the same namespace as the Metal3Machines). The credentials are used by CAPM3 to access BareMetalHosts regardless of where they are located.

@metal3-io-bot metal3-io-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 16, 2024
@metal3-io-bot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@metal3-io-bot metal3-io-bot requested review from elfosardo and Rozzii May 16, 2024 13:02
@metal3-io-bot metal3-io-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 16, 2024
@lentzi90 lentzi90 force-pushed the lentzi90/multi-tenancy branch from f9b0fb1 to f6ef4c4 Compare May 17, 2024 11:29
@lentzi90 lentzi90 marked this pull request as ready for review May 17, 2024 11:31
@lentzi90 lentzi90 changed the title WIP: Multi-tenancy proposal Multi-tenancy proposal May 17, 2024
@metal3-io-bot metal3-io-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 17, 2024
@pierrecregut
Copy link

Credentials can be put either at two levels:

  • at cluster level
  • at machine level (through machine template)

With the first approach you have chosen, there can only be one source of baremetal machines, with the second you can have as many sources as machine deployments and you can smoothly transition between sources by changing the credentials.

It can be interesting to have a cluster that spans over different regions for redundancy. It is also useful in some edge scenarios where you can have a cluster with a centralized CP and many locations with potentially many baremetal providers.

As far as I know there is no explicit requirement in CAPI multi-tenancy contract that there is a single set of credentials.

@lentzi90
Copy link
Member Author

Interesting point. I'm open to add the same identityRef to the Metal3MachineTemplate also. In fact, this would be identical to how CAPO does it. In practice I don't think it changes the proposal much. We can default the identityRef for the Metal3Machines to what is set on the Metal3Cluster and then allow the user to override that in the Metal3MachineTemplate.

@lentzi90 lentzi90 force-pushed the lentzi90/multi-tenancy branch from f6ef4c4 to 8ae7667 Compare May 21, 2024 12:26
@pierrecregut
Copy link

The title and the user stories give the impression that the proposal solves the multi-tenancy problem. It is true that two clusters can be defined in two namespaces and point to one single namespace of baremetalhosts. But the user of those clusters will have full power over those BMHs and the content of the BMH is equivalent to the content of the cluster. Unless you have something like HostClaims hiding the baremetalhost namespaces, you cannot share BMH with users you do not trust. In fact, if there are good use cases for this remote access for HostClaim, I do not really see any for BMH.

@pierrecregut
Copy link

The implementation will be tricky. We need to maintain a watch over a namespace in a cluster generating events in another cluster and we need to know when to remove this watch. Probably when the kubeconfig secret is deleted.

@pierrecregut
Copy link

Some fields in the status are copied back. But the dataTemplate mechanism also relies on labels and annotations. I expect that they will not be copied and that the controller will use the kubeconfig to get the actual values. Is that the case ?

@lentzi90
Copy link
Member Author

From the CAPI perspective this does solve multi-tenancy. Each tenant provide credentials for accessing their BareMetalHosts, similar to how it would work with cloud resources. If two tenants share the same BareMetalHosts, then that is essentially one tenant. This is also the same as for other providers. If you have access to the same OpenStack, AWS or GCP tenant then you also have control over the same cloud resources.

I understand that this does not solve the same issue as the HostClaim, but I see this as essential. It also does not block implementing HostClaims before or after. as long as it is done with concideration. That was my goal with this, to make the design of the HostClaim easier by extracting this important part.

@Rozzii
Copy link
Member

Rozzii commented May 27, 2024

/approve

@metal3-io-bot metal3-io-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 27, 2024
@lentzi90
Copy link
Member Author

/hold
I want to make sure people have a chance to check this and comment before we merge

@metal3-io-bot metal3-io-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 27, 2024
@lentzi90
Copy link
Member Author

lentzi90 commented Jun 7, 2024

Two weeks have passed and we have discussed it offline. I think it is safe to remove the hold
/hold cancel

@metal3-io-bot metal3-io-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 7, 2024
@lentzi90
Copy link
Member Author

lentzi90 commented Jun 7, 2024

/test ?

1 similar comment
@lentzi90
Copy link
Member Author

lentzi90 commented Jun 7, 2024

/test ?

@metal3-io-bot
Copy link
Contributor

@lentzi90: The following commands are available to trigger required jobs:

  • /test markdownlint
  • /test shellcheck
  • /test spellcheck

Use /test all to run the following jobs that were automatically triggered:

  • markdownlint
  • spellcheck

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@Rozzii Rozzii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Rozzii

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

- `.status.errorCount` on the BMH would become
`.status.bareMetalHost.errorCount`
- `.status.errorMessage` on the BMH would become
`.status.bareMetalHost.errorMessage`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to think about: we might expose internal details through error messages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see a risk that this could be sensitive or just something the user might learn to rely on even though they should not? Or that it would be irrelevant to the user?

I'm thinking that we will need to expose some kind of information in any case and this seems to be similar to what other providers do. In my experience it is seldom something the user can "do" anything with, but it can often be useful to determine if there is a user error or rather some issue with the infrastructure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see a risk that this could be sensitive or just something the user might learn to rely on even though they should not? Or that it would be irrelevant to the user?

Yes :)

I don't have an easy solution to this. Errors from Ironic can be vital, e.g. to understand that RootDeviceHints are wrong. Maybe it's something we should simply document.

@dtantsur
Copy link
Member

dtantsur commented Jul 4, 2024

/cc @zaneb

since you have opinions on multi-tenancy.

@metal3-io-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@metal3-io-bot metal3-io-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 2, 2024
@tuminoid
Copy link
Member

tuminoid commented Oct 2, 2024

/remove-lifecycle stale
Lennart will be back soon to continue the work.

@metal3-io-bot metal3-io-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 2, 2024
@lentzi90 lentzi90 force-pushed the lentzi90/multi-tenancy branch from 8ae7667 to 1198ab9 Compare October 22, 2024 08:11
Copy link
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like Pierre, I'm struggling to think of a real problem this solves.
If you have one I wouldn't block it though.

@lentzi90
Copy link
Member Author

@pierrecregut

In fact, if there are good use cases for this remote access for HostClaim, I do not really see any for BMH.

What about two users sharing a management cluster but having separate BMH resources?

                                        
 ┌─────────────────────────────────┐    
 │                                 │    
 │ Management cluster              │    
 │                                 │    
 │                                 │    
 │ ┌─────────────┐   ┌───────────┐ │    
 │ │             │   │           │ │    
 │ │ Tenant 1    │   │ Tenant 2  │ │    
 │ │             │   │           │ │    
 └─└─────┬───────┘───└──────┬────┘─┘    
         │                  │           
         │                  │           
         │                  │           
 ┌───────▼───────┐    ┌─────▼────────┐  
 │               │    │              │  
 │ BMH cluster 1 │    │ BMH cluster 2│  
 │               │    │              │  
 │               │    │              │  
 │               │    │              │  
 └───────────────┘    └──────────────┘  
                                        

Some fields in the status are copied back. But the dataTemplate mechanism also relies on labels and annotations. I expect that they will not be copied and that the controller will use the kubeconfig to get the actual values. Is that the case?

Sorry I missed this earlier. Only the rendered secrets would be impacted by this change, since they need to be accessible by BMO. I will update to proposal and clarify this. The idea would be to just render the secrets next to the BMHs. All the templates, claims, etc would still work as it does today. Does that answer the question?

@lentzi90
Copy link
Member Author

@zaneb thanks for the comments, I will update the proposal!

Like Pierre, I'm struggling to think of a real problem this solves.
If you have one I wouldn't block it though.

For me, the important part is separating management of BMHs from building clusters. Currently both these things must happen in the same cluster in the same namespace. I.e. if you want to create a cluster out of BMHs you will first have to create the BMHs or ask someone to do it for you in the very namespace you are planning to work with. I want to make it possible to split this up so that one team can set up a BMH management cluster where they handle pools of BMHs. They can then provide credentials to other teams that wants to use the BMHs.

Since there is already a CAPI contract for this, I think our best option is to implement that first. From there I know we can do more, like HostClaims. It would be unfortunate to skip the CAPI contract though, since that would mean our provider would work in a different way than what users are already accustomed to from other providers.

If nothing else, implementing the CAPI contract would simplify our e2e tests by solving metal3-io/cluster-api-provider-metal3#1080

This adds a proposal for how to implement the CAPI multi-tenancy
contract for CAPM3.

Signed-off-by: Lennart Jern <[email protected]>
@lentzi90 lentzi90 force-pushed the lentzi90/multi-tenancy branch from 1198ab9 to 828b7c9 Compare October 23, 2024 06:40
@pierrecregut
Copy link

For me, the important part is separating management of BMHs from building clusters. Currently both these things must happen in the same cluster in the same namespace. I.e. if you want to create a cluster out of BMHs you will first have to create the BMHs or ask someone to do it for you in the very namespace you are planning to work with. I want to make it possible to split this up so that one team can set up a BMH management cluster where they handle pools of BMHs. They can then provide credentials to other teams that wants to use the BMHs.

I fully agree it is an important problem. But if you want to split responsabilities you must be able to assign different roles to hardware teams and cluster teams. Unfortunately, for this proposal to work, you need to give full administrative rights over the hardware to cluster administrators and if you want to have hardware resources used by several clusters, those cluster administrators will have administrative rights over the servers of the other customers. So even if cluster administrators do not see cluster definitions they do not own, they can still reconstruct these through the BMH secrets.

HostClaims bring the necessary separation of roles between hardware administrators and cluster managers by ensuring that cluster managers can only influence the workload executed on the hardware and my belief is that hardware management teams that are really independent of cluster teams but are ready to share the BMC credentials with those customers are an exception.

To keep code simple I would advocate implementing the capability to associate m3machine with resources on other clusters to the hostclaim case, not bmh because of the limitation above.

@lentzi90
Copy link
Member Author

I fully agree it is an important problem. But if you want to split responsabilities you must be able to assign different roles to hardware teams and cluster teams. Unfortunately, for this proposal to work, you need to give full administrative rights over the hardware to cluster administrators and if you want to have hardware resources used by several clusters, those cluster administrators will have administrative rights over the servers of the other customers. So even if cluster administrators do not see cluster definitions they do not own, they can still reconstruct these through the BMH secrets.

HostClaims bring the necessary separation of roles between hardware administrators and cluster managers by ensuring that cluster managers can only influence the workload executed on the hardware and my belief is that hardware management teams that are really independent of cluster teams but are ready to share the BMC credentials with those customers are an exception.

To keep code simple I would advocate implementing the capability to associate m3machine with resources on other clusters to the hostclaim case, not bmh because of the limitation above.

I think we want the same thing in the end, we just have different views on how to get there. For me it seems like implementing CAPIs multi-tenancy contract first (without addressing the limitations of CAPM3s direct access to BMHs) will be easier and faster. CAPM3 will work mostly the same. We are just giving it the capability to interact with some of the objects remotely instead of locally.

On the other hand, HostClaims means a lot of changes to CAPM3 because we need new CRDs and new procedures for how CAPM3 does its fundamental job of connecting CAPI to BMO. I think we can do the remote access part first, as described in this proposal, while still remaining backwards compatible. That way we are at least a step in the right direction without too much work.

@pierrecregut
Copy link

pierrecregut commented Oct 23, 2024

I am just afraid that the "remote" capability code will be duplicated but not exactly the same between hostclaims and bmh and that it will be harder to maintain in the end. It is not only acting on a remote cluster and watching it. At some point we will also need tools to automate the management of those limited administrative rights.

In my initial proposal where we could chain hostclaims (because of the hybrid capability) I even implemented the capability to work on a remote cluster to a dedicated hostclaim kind (in fact it could be kept without hybrid in mind by just interpreting the kubeconfig in the hostclaim controller). In that case you do not even have to modify dataTemplates or machine remediation to work on another cluster (but they must undertand claim objects). It had also the advantage of being usable without capm3 (you define a hostclaim to execute an arbitrary OS image and it is synced with an hostclaim copy on a remote baremetal as a service manager where it is associated and executed).

Even if we do not follow that route, reducing the scope of where you need to act on remote objects to hostclaims could reduce the complexity in the capm3 provider.

@lentzi90
Copy link
Member Author

lentzi90 commented Oct 23, 2024

Even if we do not follow that route, reducing the scope of where you need to act on remote objects to hostclaims could reduce the complexity in the capm3 provider.

I agree, and I think that we may end up doing this later.
Part of the issue with implementing HostClaims right away for me is that it isn't clear where the code should live. We are building abstractions on top of abstractions. It is getting complex and hard to understand. I hear arguments for having it as a completely separate entity, but I can also see many arguments for keeping it within CAPM3 and BMO.
Oh well, I'm getting off topic. My point is that I appreciate the concern, but I think it will be far easier to implement this proposal as-is first, and then later adjust CAPM3 to a more fine grained access model.

@zaneb
Copy link
Member

zaneb commented Oct 23, 2024

So it sounds like we all agree that in the long term we need to do two things:

  1. split the BMH CRD so that the HW administrator and provisioner roles can be controlled separately by RBAC
  2. allow the provisioner role to operate from remote namespaces (i.e. a different namespace from where the HW administrator creates the BMH)

The premise of this proposal seems to be that we implement (2) first because... we can, even though it is not useful on its own, in the hope that it turns out to be the right thing for whatever implementation of (1) we might eventually come up with.

I do worry that hope is not a great strategy here. Since we don't have so much as an agreed design for (1), let alone a working implementation we can try out, we are flying blind on whether this will turn out to be the right thing.

But my big objection to this is that it is tied to CAPM3, which means you can only take advantage when provisioning kubernetes clusters with CAPI. In my view, multitenancy should be implemented on the BMO side of the fence because that makes it possible to use multitenancy for provisioning anything. So at best this may turn out to be redundant, and at worst it could prevent us from coming up with the correct design.

@pierrecregut
Copy link

@zaneb you have perfectly stated the context and the stakes.

Case 1 does not need 2 to be implemented. We can first limit to a case where the contract boundaries are all in the same cluster.

Case 2 can clearly benefit from the implementation of 1. Because when the separation is clearly established for security of multi tenancy we also know what should be transferred/synchronized. We can even have some implementation solutions where this has no impact on the existing capm3 code (the remote hostclaim sketched above)

On the other hand implementing 1 with 2 already available and not fully designed with 1 in mind may end up being more complex.

The title of this proposal should reflect it implements case 2 ("Remote binding" ?) rather than "Multi-tenancy" (case 1)

If there was a strong business case to implement 2 very fast, it would make sense to do it but because of the limitations given above, I don't think so.

I also agree on the fact that it must be implemented with the bmh. As far as 1 is concerned, beside information exchange, hostclaim will mainly move the association logic from capm3 to the hostclaim controller implemented alongside the bmh controller. For 2, the controller must be on the cluster with capm3... but a solution outside capm3 code and usable without it would be better... and brings us back to a hostclaim designed to be synchronized with a remote copy.

@lentzi90
Copy link
Member Author

The premise of this proposal seems to be that we implement (2) first because... we can, even though it is not useful on its own, in the hope that it turns out to be the right thing for whatever implementation of (1) we might eventually come up with.

I don't know how to put this more clearly. This proposal in itself is useful:

  • It addresses Reuse CAPI e2e test  cluster-api-provider-metal3#1080
  • It implements a contract that CAPI has defined for providers. A contract that CAPI users can be expected to already use with other providers. Aligning CAPM3 with the CAPI community is worth something in itself to avoid surprises and letdowns.
  • It allows users sharing a cluster to use multiple different BMH pools that are managed elsewhere.

I admit that it does not address the use case of multiple users sharing the same BMH pool in a safe way and that it does not reduce the access rights CAPM3 require for BMHs. But I definitely do not agree that this makes it useless.

But my big objection to this is that it is tied to CAPM3, which means you can only take advantage when provisioning kubernetes clusters with CAPI. In my view, multitenancy should be implemented on the BMO side of the fence because that makes it possible to use multitenancy for provisioning anything. So at best this may turn out to be redundant, and at worst it could prevent us from coming up with the correct design.

I think we agree here actually. This was exactly why I wanted to split it out and do this first. This proposal does not affect BMO in any way. It only allows CAPM3 to interact with multiple BMO instances. This is what CAPI considers multi-tenancy. The contract requires changes to CAPM3. This is not something that can be done in BMO.

I do agree that another kind of multi-tenancy is needed in BMO though and I believe that it will be easier to iron it out once we have this piece in place. Why? Because the CAPI contract is already fixed. There is no way around it. It will be easier to relate to it when we have it implemented instead of trying to design a BMO multi-tenancy "blind" and then hoping it will fit with the already defined CAPI mutli-tenancy.

The title of this proposal should reflect it implements case 2 ("Remote binding" ?) rather than "Multi-tenancy" (case 1)

I can change the title if it helps. However, this whole proposal is targeted at CAPM3 which is a CAPI provider. Implementing the CAPI multi-tenancy contract is definitely about multi-tenancy, not remote-binding. Would it be better with "CAPI multi-tenancy proposal"?

I agree that this is not BMO multi-tenancy. In BMO we will need different access levels so that some users can consume BMHs while others act as admins. CAPM3 would then end up as a consumer. But this does not in any way address CAPI multi-tenancy. Even with CAPM3 as a consumer it MUST still be able to accept separate credentials per tenant to fill the CAPI contract.

@lentzi90 lentzi90 changed the title Multi-tenancy proposal CAPI Multi-tenancy contract proposal for CAPM3 Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants