Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Configuring CephCluster HealthCheck Settings in OCS Operator #2940

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions api/v1/storagecluster_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,9 @@ type ManageCephCluster struct {

// Whether to allow updating the device class after the OSD is initially provisioned
AllowDeviceClassUpdate bool `json:"allowDeviceClassUpdate,omitempty"`

// CephClusterHealthCheckSpec represent the healthcheck for Ceph daemons
HealthCheck *rookCephv1.CephClusterHealthCheckSpec `json:"healthCheck,omitempty"`
}

// ManageCephConfig defines how to reconcile the Ceph configuration
Expand Down
5 changes: 5 additions & 0 deletions api/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

396 changes: 396 additions & 0 deletions config/crd/bases/ocs.openshift.io_storageclusters.yaml

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions controllers/storagecluster/cephcluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -511,6 +511,10 @@ func newCephCluster(sc *ocsv1.StorageCluster, cephImage string, kmsConfigMap *co
cephCluster.Spec.DisruptionManagement.OSDMaintenanceTimeout = sc.Spec.ManagedResources.CephCluster.OsdMaintenanceTimeout
}

if sc.Spec.ManagedResources.CephCluster.HealthCheck != nil {
cephCluster.Spec.HealthCheck = *sc.Spec.ManagedResources.CephCluster.HealthCheck
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a user set nil to disable healthcheck? Is that allowed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil would cause Rook's defaults to be used. To disable the health checks, there are subsettings for enabled: false. See the example here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn in that case how someone can go back to the default? As there is a nil check here it won't allow for that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the following comment yesterday, but missed submitting it. Looks like the conversation is already resolved though...

if sc.Spec.ManagedResources.CephCluster.HealthCheck == nil, the CephCluster CR would be generated without the settings, which would cause Rook to use the defaults, the same as before this change. cephCluster.Spec.HealthCheck is not nullable, so we can't set it to nil. Leaving the struct empty will effectively enable the defaults.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to test this scenario on live cluster

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was just wodering about the case where user created the StorageCluster without sc.Spec.ManagedResources.CephCluster.HealthCheck and later user sets it in the storagecluster and gets the expected changes in the CephCluster and later user Removes the HealthCheck setting from the storagecluster which possibly sc.Spec.ManagedResources.CephCluster.HealthCheck will be set to nil but ocs-operator doesn't revert back the changes on the cephcluster CR where in cephCluster CR it will still show the older values what ocs-operator added earlier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1 I did not test this scenario. I can deploy a new cluster and test it. In addition, I will update the code based on your comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1 @travisn @iamniting
Some tests failed after removing the if condition sc.Spec.ManagedResources.CephCluster.HealthCheck == nil.
Would you like to update the struct definition from:
HealthCheck *rookCephv1.CephClusterHealthCheckSpec \json:"healthCheck,omitempty" to: HealthCheck rookCephv1.CephClusterHealthCheckSpec json:"healthCheck,omitempty"``
to avoid working with a pointer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my comment above, we need that check for nil as you had it originally. Let's keep the pointer, so we can better know if the user has any customized settings or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn @Madhu-1 @iamniting I have incorporated Madhu and Travis's feedback into the code. Once everyone has reviewed and agreed on the changes, I will proceed to test it on the OCP cluster with a private image.

}

if sc.Spec.LogCollector != nil {
if sc.Spec.LogCollector.Periodicity != "" {
cephCluster.Spec.LogCollector.Periodicity = sc.Spec.LogCollector.Periodicity
Expand Down
71 changes: 71 additions & 0 deletions controllers/storagecluster/cephcluster_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1539,6 +1539,77 @@ func TestEnsureUpgradeReliabilityParams(t *testing.T) {
assert.Equal(t, 45*time.Minute, expected.Spec.DisruptionManagement.OSDMaintenanceTimeout)
}

func TestHealthCheckConfiguration(t *testing.T) {
sc := &ocsv1.StorageCluster{}
mockStorageCluster.DeepCopyInto(sc)
interval := metav1.Duration{
Duration: 20 * time.Second,
}
mockProbeSpec := &rookCephv1.ProbeSpec{
Disabled: false,
Probe: &corev1.Probe{
InitialDelaySeconds: 10,
TimeoutSeconds: 5,
},
}
probeMap := make(map[rookCephv1.KeyType]*rookCephv1.ProbeSpec)
probeMap["abc"] = mockProbeSpec

sc.Spec.ManagedResources.CephCluster.HealthCheck = &rookCephv1.CephClusterHealthCheckSpec{
DaemonHealth: rookCephv1.DaemonHealthSpec{
Status: rookCephv1.HealthCheckSpec{
Timeout: "11",
Disabled: false,
Interval: &interval,
},
Monitor: rookCephv1.HealthCheckSpec{
Timeout: "22",
Disabled: true,
Interval: &interval,
},
ObjectStorageDaemon: rookCephv1.HealthCheckSpec{
Timeout: "33",
Disabled: false,
Interval: &interval,
},
},
StartupProbe: probeMap,
LivenessProbe: probeMap,
}
expected := newCephCluster(sc, "", nil, log)

assert.Equal(t, "11", expected.Spec.HealthCheck.DaemonHealth.Status.Timeout)
assert.Equal(t, false, expected.Spec.HealthCheck.DaemonHealth.Status.Disabled)
assert.Equal(t, &interval, expected.Spec.HealthCheck.DaemonHealth.Status.Interval)

assert.Equal(t, "22", expected.Spec.HealthCheck.DaemonHealth.Monitor.Timeout)
assert.Equal(t, true, expected.Spec.HealthCheck.DaemonHealth.Monitor.Disabled)
assert.Equal(t, &interval, expected.Spec.HealthCheck.DaemonHealth.Monitor.Interval)

assert.Equal(t, "33", expected.Spec.HealthCheck.DaemonHealth.ObjectStorageDaemon.Timeout)
assert.Equal(t, false, expected.Spec.HealthCheck.DaemonHealth.ObjectStorageDaemon.Disabled)
assert.Equal(t, &interval, expected.Spec.HealthCheck.DaemonHealth.ObjectStorageDaemon.Interval)

compareProbeMaps(t, probeMap, expected.Spec.HealthCheck.LivenessProbe)
compareProbeMaps(t, probeMap, expected.Spec.HealthCheck.StartupProbe)

}

// Helper function to compare two maps
func compareProbeMaps(t *testing.T, map1, map2 map[rookCephv1.KeyType]*rookCephv1.ProbeSpec) {
assert.Equal(t, len(map1), len(map2))

for key, value1 := range map1 {
value2, exists := map2[key]
assert.Assert(t, exists, "Key %v not found in map2", key)

// Compare the actual ProbeSpec values
assert.Equal(t, value1.Disabled, value2.Disabled)
assert.Equal(t, value1.Probe.InitialDelaySeconds, value2.Probe.InitialDelaySeconds)
assert.Equal(t, value1.Probe.TimeoutSeconds, value2.Probe.TimeoutSeconds)
}
}

func TestDetermineDefaultCephDeviceClass(t *testing.T) {
cases := []struct {
label string
Expand Down
396 changes: 396 additions & 0 deletions deploy/csv-templates/crds/ocs/ocs.openshift.io_storageclusters.yaml

Large diffs are not rendered by default.

396 changes: 396 additions & 0 deletions deploy/ocs-operator/manifests/storagecluster.crd.yaml

Large diffs are not rendered by default.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading