Rethink Reconcile Handling #80

Kidswiss · 2020-06-24T11:16:54Z

At first the reconcile functions were rather simple and the functionality was added directly there.

But we're now at a point where this doesn't scale very well. So I'd like to introduce some more structure to the reconcile functions. We have to to things in certain orders, as they could block or affect other steps during the reconcile. All that logic currently resides inside the respective reconcile loops. This makes it very hard to introduce new functionality that has to be done in certain orders. Also there's a lot of repetition as each reconcile has to do various steps that are common for all the reconcile loops (adding certain labels, or writing back the manipulated CR to the API, etc.).

With some inspiration from https://crossplane.io/docs/master/contributing/services_developer_guide.html I'd like to propose some changes:

We'll define an interface(s) that exposes functions for:

Fetching the CR
Checking the CR against all mandatory fields (labels etc.)
Comparing the state of the external resource (Vault, Git, etc) against the CR definition
Triggering the state change for the external resource
Reflect the actual state in the CR
Write the CR back to the K8s API
Other things and maybe some future functionality

These functions are roughly in two categories: determine the state and apply the state and may be split from each other.

Then we'd need some controller, that will go through these functions and determine what actions have to be taken. This sounds like something that could easily be modelled by a FSM (finite state machine). Where it will transition through the various possible states until it reaches the final state where everything is in sync.

This has various benefits:

The workflow is exactly the same for all our CRs defined by the FSM
Adding new CRs to the operator is as easy as implementing the interface(s) and feeding to the FSM
New states can be added easily, golang example for a flexible FSM: https://levelup.gitconnected.com/implement-a-finite-state-machine-in-golang-f0438b6bc0a8

Kidswiss · 2020-06-24T12:34:27Z

After a detailed call with @corvus-ch my idea to use a full blown FSM may be overkill. His suggestions:

We can use a weighted list, where all the various, small steps are added. Each of these steps has a single responsibility, like addLabel, CreateGit, RemoveSecrets, etc.

In this new design, the reconcile function only fetches the necessary information from the cluster to create a state. The list of steps will be filtered according to the state, to get a smaller list of steps that need to be executed. This makes it very easy to extend the reconcile logic, as well as adding the steps at the right point.

Kidswiss · 2020-06-30T11:15:24Z

Interesting article by redhat, on best practices for reconciling: https://www.openshift.com/blog/kubernetes-operators-best-practices

Kidswiss · 2020-07-29T12:52:20Z

I created a POC implementing the idea discussed: https://github.com/Kidswiss/execution-engine

With that it's easy to add more functionality to a reconcile flow, especially for logic, that should be added to all the CRDs.

srueg · 2020-09-30T09:23:19Z

As discussed, we should probably make sure each reconcile loop only updates it's own CRs:
The cluster_reconcile should only update Cluster objects and not tenants for example. The tenant reconcile should find a list of clusters for each tenant and update the tenant object accordingly.

This should solve the conflicts we are experiencing currently.

Kidswiss · 2020-10-19T12:54:40Z

From the last call about this:

It doesn't look like having Git Repositories as a separate CRD adds much in terms of functionality -> check for use cases before removing
Let's cleanup the reconcile functions before we migrate to the OperatorSDK 1.0

Kidswiss · 2020-10-20T14:34:04Z

The refactoring is completed.

The various CRDs now fetch the necessary information now instead of being pushed. That got rid of all the race conditions we observed before. One issue though: the cluster files in the tenant repository will only be applied on the next tenant reconcile.

If the cluster controller sets the right owner by itself, it should be possible to accelerate that. Then the cluster will be observable as a secondary resource.

srueg · 2020-10-20T14:38:23Z

We have an issue to implement according ownerReferences: #45

srueg added RFC Request for comments help wanted Extra attention is needed labels Jul 2, 2020

Kidswiss self-assigned this Oct 20, 2020

Kidswiss mentioned this issue Oct 22, 2020

Refactor reconciler #120

Merged

4 tasks

srueg closed this as completed Oct 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethink Reconcile Handling #80

Rethink Reconcile Handling #80

Kidswiss commented Jun 24, 2020 •

edited

Loading

Kidswiss commented Jun 24, 2020

Kidswiss commented Jun 30, 2020

Kidswiss commented Jul 29, 2020

srueg commented Sep 30, 2020

Kidswiss commented Oct 19, 2020

Kidswiss commented Oct 20, 2020

srueg commented Oct 20, 2020

Rethink Reconcile Handling #80

Rethink Reconcile Handling #80

Comments

Kidswiss commented Jun 24, 2020 • edited Loading

Kidswiss commented Jun 24, 2020

Kidswiss commented Jun 30, 2020

Kidswiss commented Jul 29, 2020

srueg commented Sep 30, 2020

Kidswiss commented Oct 19, 2020

Kidswiss commented Oct 20, 2020

srueg commented Oct 20, 2020

Kidswiss commented Jun 24, 2020 •

edited

Loading