Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syncing latest changes from main for ramen #207

Merged
merged 42 commits into from
Mar 2, 2024
Merged

Conversation

df-build-team
Copy link

PR containing the latest commits from main branch

Benamar Mekhissi and others added 30 commits February 5, 2024 08:56
This commit introduces changes to disown PVC resources when a VRG is deleted,
particularly in scenarios where the DR is disabled. The motivation behind this
adjustment is to give back the management of the PVC lifecycle to OCM.

Upon disabling DR, the VRG now iterates through all volsync-managed PVCs.
It removes VRG ownership from these PVCs and reinstates OCM annotations.
This ensures that PVCs are not prematurely garbage collected upon VRG deletion.

In order for that to work, a new annotation needs to be added to the VRG to
indicate that the PVCs should be disowned upon DR disabling
drplacementcontrol.ramendr.openshift.io/do-not-delete-pvc: "true"

Signed-off-by: Benamar Mekhissi <[email protected]>
This commit changes the name and the values that we log with. The logger
is created with the name of the controller. The logger also adds the
values of customresource, its UID and the version. This makes it very
easy to filter out the log messages based on those attributes
incrementally, thereby allowing one to look at the reconcile logs at the
the lowest granularity. The reconcile exit also logs the time spent for
reconcile.

Signed-off-by: Raghavendra Talur <[email protected]>
Signed-off-by: Raghavendra Talur <[email protected]>
Signed-off-by: Elena Gershkovich <[email protected]>
Signed-off-by: Raghavendra Talur <[email protected]>
The verbosity was useful maybe early in the project, it doesn't help
much anymore.

This change also makes the default value good enough for the CI,
therefore don't override the env in the CI.

Signed-off-by: Raghavendra Talur <[email protected]>
This also lets us have a default KUBEBUILDER_ASSETS file and simplifies
the Makefile.

Signed-off-by: Raghavendra Talur <[email protected]>
this refactoring helps to set metrics even though
one of the cluster is down, at this time,the metrics
value is set as 0 and alerts are fired.

Signed-off-by: rakeshgm <[email protected]>
Signed-off-by: Raghavendra Talur <[email protected]>
Current check for metro actions that use isMetroAction passed
in the to and from clusters to check equivalence of region
values. This would succeed when to and from clusters are the
same, which can happen when the preferredCluster and the
failoverCluster are the same (Deploy->Failover->Failover).

As a result in a regional DR policy, such a state would enter
processing or waiting for fencing to take effect before
proceeding with the Failover.

This commit switches to the policy based metro determination
instead of the to and from clusters as before to resolve the
issue.

Signed-off-by: Shyamsundar Ranganathan <[email protected]>
…ionDestination resource

post successful PV/PVC restore

The issue arises during successful restoration of PV/PVC when the reconciliation proceeds to
set up the ReplicationSource without saving the restore status. Subsequently, if a failure
occurs or Ramen restarts before ReplicationSource setup completion, the next reconciliation
fails due to the previously deleted ReplicationDestination.

To address this, upon restore completion, reconciliation will exit, ensuring status update.
In subsequent reconciliations, the restore is bypassed, and the ReplicationSource setup step
will start.

Signed-off-by: Benamar Mekhissi <[email protected]>
Signed-off-by: Benamar Mekhissi <[email protected]>
When the request for DRPC to delete is received, we remove the finalizer
and then delete the metrics. The object is then processed for deletion.
The API server may not delete immediately and it may take some time for
garbage collection to complete the request. Ramen may get the reconcile
call again for which ramen may get the DRPC object which is the deletion
state. At this point in time, we try to set the metrics with zero value
and alerts are being fired. The fix here is that metrics will only be
set if DRPC is not being deleted

Signed-off-by: rakeshgm <[email protected]>
Signed-off-by: Shyamsundar Ranganathan <[email protected]>
This helps determine using an MCV later if the VRG on the managed
cluster was created by a manifest work from the current DRPC
resource instance.

In cases of hub recovery this will mismatch as the DRPC is
recreated on the hub and hence would have a different UID and
helps to not orphan VRGs on the managed clusters due to missing
hub manifest work for the same.

The change to add the UID in the initial manifest work creation
ensures other parts of the code that fetch the manifest and update
it are not impacted. IOW, if a manifest work exists for a VRG it
is created by the current instance of the DRPC as manifest work
resources are not recovered on a hub recovery or in a declarative
creation of the DRPC.

Signed-off-by: Shyamsundar Ranganathan <[email protected]>
Adoption ensures that there is a ManifestWork that manages the VRG
resource on the ManagedCluster.

In the case of hub recovery, we would find VRGs on the managed cluster
without any ManifestWork for the same on the hub. This change ensures
that such missing ManifestWork is created on the hub by the current
DRPC resource.

It does this by storing the DRPC resource UID as an annotation on the
created VRG, and on any discrepency ensures that the current DRPC creates
a manifest work for the VRG on the new hub.

The case for upgrade is also handled, where a manifest work may be present
but without the DRPC UID annotation on it.

Signed-off-by: Shyamsundar Ranganathan <[email protected]>
Test covers UID adoption by faking returns from MCV and also
pre-creating an MW for one of the managed clusters.

This covers most cases where the VRG needs to be adopted due to
upgrade cases, or due to it being orphaned.

Signed-off-by: Shyamsundar Ranganathan <[email protected]>
ShyamsundarR and others added 12 commits February 21, 2024 08:20
Signed-off-by: Shyamsundar Ranganathan <[email protected]>
Explain why we need Go 1.20 and how to maintain multiple Go versions so
ramen can be built and tested while using newer default Go version.

Signed-off-by: Nir Soffer <[email protected]>
When comparing PVs, skip comparing unset "Spec.ClaimRef.kind". This
breaks validation when using KubeVirt VM, and actual resources in the
system do not match the backed up resources in the s3 store. It is
correct to ignore unset kind since this is an optional field[1].

Previously we failed with:

    Failed to restore PVs: failed to restore ClusterData for VolRep
    (failed to restore PVs and PVCs using profile list
    ([s3profile-perf8-ocs-storagecluster]): failed to restore all
    []v1.PersistentVolume. Total/Restored 1/0)

And then the VRG will not make any progress. Now we consider unset
"kind" as equal and continue the flow normally.

[1] https://github.com/kubernetes/api/blob/f3648a53522eb60ea75d70d36a50c799f7e4e23b/core/v1/types.go#L6381

Bug: https://bugzilla.redhat.com/2262455
Signed-off-by: Nir Soffer <[email protected]>
This current rook version fails to take latest ci changes.
This commit updates the rook version.

Signed-off-by: yati1998 <[email protected]>
Submariner renamed the `Synced` condition to `Ready`[1]. Good change but
it broke our tests.

[1] submariner-io/lighthouse#1211

Signed-off-by: Nir Soffer <[email protected]>
Signed-off-by: Raghavendra Talur <[email protected]>
Signed-off-by: Raghavendra Talur <[email protected]>
Earlier, we were logging the maintenance error and also setting the
requeue flag but the error was not being returned to the reconciler.

Signed-off-by: Raghavendra Talur <[email protected]>
@df-build-team df-build-team requested a review from a team March 1, 2024 23:33
Copy link

openshift-ci bot commented Mar 1, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: df-build-team

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ShyamsundarR ShyamsundarR merged commit dda4ced into release-4.16 Mar 2, 2024
50 of 51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants