Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink Reconcile Handling #80

Closed
Kidswiss opened this issue Jun 24, 2020 · 7 comments
Closed

Rethink Reconcile Handling #80

Kidswiss opened this issue Jun 24, 2020 · 7 comments
Assignees
Labels
help wanted Extra attention is needed RFC Request for comments

Comments

@Kidswiss
Copy link
Contributor

Kidswiss commented Jun 24, 2020

At first the reconcile functions were rather simple and the functionality was added directly there.

But we're now at a point where this doesn't scale very well. So I'd like to introduce some more structure to the reconcile functions. We have to to things in certain orders, as they could block or affect other steps during the reconcile. All that logic currently resides inside the respective reconcile loops. This makes it very hard to introduce new functionality that has to be done in certain orders. Also there's a lot of repetition as each reconcile has to do various steps that are common for all the reconcile loops (adding certain labels, or writing back the manipulated CR to the API, etc.).

With some inspiration from https://crossplane.io/docs/master/contributing/services_developer_guide.html I'd like to propose some changes:

We'll define an interface(s) that exposes functions for:

  • Fetching the CR
  • Checking the CR against all mandatory fields (labels etc.)
  • Comparing the state of the external resource (Vault, Git, etc) against the CR definition
  • Triggering the state change for the external resource
  • Reflect the actual state in the CR
  • Write the CR back to the K8s API
  • Other things and maybe some future functionality

These functions are roughly in two categories: determine the state and apply the state and may be split from each other.

Then we'd need some controller, that will go through these functions and determine what actions have to be taken. This sounds like something that could easily be modelled by a FSM (finite state machine). Where it will transition through the various possible states until it reaches the final state where everything is in sync.

This has various benefits:

@Kidswiss
Copy link
Contributor Author

After a detailed call with @corvus-ch my idea to use a full blown FSM may be overkill. His suggestions:

We can use a weighted list, where all the various, small steps are added. Each of these steps has a single responsibility, like addLabel, CreateGit, RemoveSecrets, etc.

In this new design, the reconcile function only fetches the necessary information from the cluster to create a state. The list of steps will be filtered according to the state, to get a smaller list of steps that need to be executed. This makes it very easy to extend the reconcile logic, as well as adding the steps at the right point.

@Kidswiss
Copy link
Contributor Author

Interesting article by redhat, on best practices for reconciling: https://www.openshift.com/blog/kubernetes-operators-best-practices

@srueg srueg added RFC Request for comments help wanted Extra attention is needed labels Jul 2, 2020
@Kidswiss
Copy link
Contributor Author

I created a POC implementing the idea discussed: https://github.com/Kidswiss/execution-engine

With that it's easy to add more functionality to a reconcile flow, especially for logic, that should be added to all the CRDs.

@srueg
Copy link
Contributor

srueg commented Sep 30, 2020

As discussed, we should probably make sure each reconcile loop only updates it's own CRs:
The cluster_reconcile should only update Cluster objects and not tenants for example. The tenant reconcile should find a list of clusters for each tenant and update the tenant object accordingly.

This should solve the conflicts we are experiencing currently.

@Kidswiss
Copy link
Contributor Author

From the last call about this:

  • It doesn't look like having Git Repositories as a separate CRD adds much in terms of functionality -> check for use cases before removing
  • Let's cleanup the reconcile functions before we migrate to the OperatorSDK 1.0

@Kidswiss Kidswiss self-assigned this Oct 20, 2020
@Kidswiss
Copy link
Contributor Author

The refactoring is completed.

The various CRDs now fetch the necessary information now instead of being pushed. That got rid of all the race conditions we observed before. One issue though: the cluster files in the tenant repository will only be applied on the next tenant reconcile.

If the cluster controller sets the right owner by itself, it should be possible to accelerate that. Then the cluster will be observable as a secondary resource.

@srueg
Copy link
Contributor

srueg commented Oct 20, 2020

We have an issue to implement according ownerReferences: #45

@Kidswiss Kidswiss mentioned this issue Oct 22, 2020
4 tasks
@srueg srueg closed this as completed Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed RFC Request for comments
Projects
None yet
Development

No branches or pull requests

2 participants