Skip to content

Commit

Permalink
fix(backendconnection): clean up resources from older operator versions
Browse files Browse the repository at this point in the history
  • Loading branch information
basti1302 committed Sep 12, 2024
1 parent c893a38 commit ceec292
Show file tree
Hide file tree
Showing 9 changed files with 218 additions and 14 deletions.
8 changes: 5 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,10 +181,12 @@ they deploy in their `AfterAll`/`AfterEach` hooks. The scripts in `test-resource
FILELOG_OFFSET_SYNCH_IMG_PULL_POLICY="" \
test-resources/bin/test-roundtrip-01-aum-operator-cr.sh
```
Note: Unsetting parameters like `CONTROLLER_IMG_REPOSITORY` explicitly (by setting them to an empty string) will lead to
the scenario not setting those values when deploying via helm, so that the default value from the chart will
actually be used. Otherwise, without `CONTROLLER_IMG_REPOSITORY=""` being present, the test script will use
Note: Unsetting parameters like `CONTROLLER_IMG_REPOSITORY` explicitly (by setting them to an empty string) will
lead to the scenario not setting those values when deploying via helm, so that the default value from the chart
will actually be used. Otherwise, without `CONTROLLER_IMG_REPOSITORY=""` being present, the test script will use
`CONTROLLER_IMG_REPOSITORY=operator-controller` (the image built from local sources) as the default setting.
* You can add `OPERATOR_HELM_CHART_VERSION=0.11.0` to the command above to install a specific version of the
Helm chart. This can be useful to test upgrade scenarios.
## Make Targets
Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ infrastructure to Dash0.
## Description

The Dash0 Kubernetes operator enables gathering OpenTelemetry data from your workloads for a selection of supported
runtimes.
runtimes, automatic log collection and metrics.

The Dash0 Kubernetes operator is currently available as a technical preview.

Expand All @@ -22,3 +22,7 @@ Supported runtimes:

The preferred method of installation is via the operator's
[Helm chart](https://github.com/dash0hq/dash0-operator/blob/main/helm-chart/dash0-operator/README.md).

The [Helm chart documentation](https://github.com/dash0hq/dash0-operator/blob/main/helm-chart/dash0-operator/README.md)
also contains all other relevant information for getting started with the operator, like how to enable Dash0 monitoring
for your workloads etc.
12 changes: 8 additions & 4 deletions helm-chart/dash0-operator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,19 @@ Simply install the operator into your cluster to get OpenTelemetry data flowing

The Dash0 Kubernetes operator installs an OpenTelemetry collector into your cluster that sends data to your Dash0
ingress endpoint, with authentication already configured out of the box. Additionally, it will enable gathering
OpenTelemetry data from applications deployed to the cluster for a selection of supported runtimes.
OpenTelemetry data from applications deployed to the cluster for a selection of supported runtimes, plus automatic log
collection and metrics.

More information on the Dash0 Kubernetes Operator can be found at
https://github.com/dash0hq/dash0-operator/blob/main/README.md.
The Dash0 Kubernetes operator is currently available as a technical preview.

Supported runtimes:

* Node.js 18 and beyond

## Prerequisites

- [Kubernetes](https://kubernetes.io/) >= 1.xx
- [Helm](https://helm.sh) >= 3.xx, please refer to Helm's [documentation](https://helm.sh/docs/) for more information
- [Helm](https://helm.sh) >= 3.x, please refer to Helm's [documentation](https://helm.sh/docs/) for more information
on installing Helm.

## Installation
Expand Down
81 changes: 81 additions & 0 deletions internal/backendconnection/otelcolresources/desired_state.go
Original file line number Diff line number Diff line change
Expand Up @@ -1010,3 +1010,84 @@ func labels(addOptOutLabel bool) map[string]string {
}
return lbls
}

func compileObsoleteResources(namespace string, namePrefix string) []client.Object {
openTelemetryCollectorSuffix := "opentelemetry-collector"
openTelemetryCollectorAgentSuffix := "opentelemetry-collector-agent"
clusterMetricsCollectorSuffix := "cluster-metrics-collector"

return []client.Object{
// K8s resources that were created by the operator in versions 0.9.0 to 0.16.0, becoming obsolete with
// version 0.17.0:
&corev1.ServiceAccount{
ObjectMeta: obsoleteResourceObjectMeta(
namespace, fmt.Sprintf("%s-%s", namePrefix, openTelemetryCollectorSuffix)),
},
&corev1.ConfigMap{
ObjectMeta: obsoleteResourceObjectMeta(
namespace, fmt.Sprintf("%s-%s", namePrefix, openTelemetryCollectorAgentSuffix)),
},
&corev1.ConfigMap{
ObjectMeta: obsoleteResourceObjectMeta(
namespace, fmt.Sprintf("%s-%s", namePrefix, "filelogoffsets")),
},
&rbacv1.ClusterRole{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("%s-%s", namePrefix, openTelemetryCollectorSuffix),
},
},
&rbacv1.ClusterRoleBinding{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("%s-%s", namePrefix, openTelemetryCollectorSuffix),
},
},
&rbacv1.Role{
ObjectMeta: obsoleteResourceObjectMeta(
namespace, fmt.Sprintf("%s-%s", namePrefix, openTelemetryCollectorSuffix)),
},
&rbacv1.RoleBinding{
ObjectMeta: obsoleteResourceObjectMeta(
namespace, fmt.Sprintf("%s-%s", namePrefix, openTelemetryCollectorSuffix)),
},
&corev1.Service{
ObjectMeta: obsoleteResourceObjectMeta(
namespace, fmt.Sprintf("%s-%s", namePrefix, openTelemetryCollectorSuffix)),
},
&appsv1.DaemonSet{
ObjectMeta: obsoleteResourceObjectMeta(
namespace, fmt.Sprintf("%s-%s", namePrefix, openTelemetryCollectorAgentSuffix)),
},

// Additional deployment related resources that were only created in version 0.16.0, also obsolete starting at
// version 0.17.0:
&corev1.ServiceAccount{
ObjectMeta: obsoleteResourceObjectMeta(
namespace, fmt.Sprintf("%s-%s", namePrefix, clusterMetricsCollectorSuffix)),
},
&rbacv1.ClusterRole{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("%s-%s", namePrefix, clusterMetricsCollectorSuffix),
},
},
&rbacv1.ClusterRoleBinding{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("%s-%s", namePrefix, clusterMetricsCollectorSuffix),
},
},
&corev1.ConfigMap{
ObjectMeta: obsoleteResourceObjectMeta(
namespace, fmt.Sprintf("%s-%s", namePrefix, clusterMetricsCollectorSuffix)),
},
&appsv1.Deployment{
ObjectMeta: obsoleteResourceObjectMeta(
namespace, fmt.Sprintf("%s-%s", namePrefix, clusterMetricsCollectorSuffix)),
},
}
}

func obsoleteResourceObjectMeta(namespace string, name string) metav1.ObjectMeta {
return metav1.ObjectMeta{
Name: name,
Namespace: namespace,
}
}
52 changes: 48 additions & 4 deletions internal/backendconnection/otelcolresources/otelcol_resources.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
"errors"
"fmt"
"slices"
"sync/atomic"

"github.com/cisco-open/k8s-objectmatcher/patch"
"github.com/go-logr/logr"
Expand All @@ -27,10 +28,11 @@ import (

type OTelColResourceManager struct {
client.Client
Scheme *runtime.Scheme
DeploymentSelfReference *appsv1.Deployment
OTelCollectorNamePrefix string
DevelopmentMode bool
Scheme *runtime.Scheme
DeploymentSelfReference *appsv1.Deployment
OTelCollectorNamePrefix string
DevelopmentMode bool
obsoleteResourcesHaveBeenDeleted atomic.Bool
}

const (
Expand All @@ -49,6 +51,11 @@ func (m *OTelColResourceManager) CreateOrUpdateOpenTelemetryCollectorResources(
selfMonitoringConfiguration selfmonitoring.SelfMonitoringConfiguration,
logger *logr.Logger,
) (bool, bool, error) {
err := m.deleteObsoleteResourcesFromPreviousOperatorVersions(ctx, namespace, logger)
if err != nil {
return false, false, err
}

config := &oTelColConfig{
Namespace: namespace,
NamePrefix: m.OTelCollectorNamePrefix,
Expand Down Expand Up @@ -277,3 +284,40 @@ func (m *OTelColResourceManager) DeleteResources(
}
return nil
}

func (m *OTelColResourceManager) deleteObsoleteResourcesFromPreviousOperatorVersions(
ctx context.Context,
namespace string,
logger *logr.Logger,
) error {
if m.obsoleteResourcesHaveBeenDeleted.Load() {
return nil
}
obsoleteResources := compileObsoleteResources(
namespace,
m.OTelCollectorNamePrefix,
)
var allErrors []error
for _, obsoleteResource := range obsoleteResources {
err := m.Client.Delete(ctx, obsoleteResource)
if err != nil {
if apierrors.IsNotFound(err) {
// expected, ignore silently
} else {
allErrors = append(allErrors, err)
}
} else {
logger.Info(fmt.Sprintf(
"deleted obsolete resource %s/%s",
obsoleteResource.GetNamespace(),
obsoleteResource.GetName(),
))
}
}
if len(allErrors) > 0 {
return errors.Join(allErrors...)
}

m.obsoleteResourcesHaveBeenDeleted.Store(true)
return nil
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ package otelcolresources

import (
"context"
"fmt"

appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"sigs.k8s.io/controller-runtime/pkg/client"
Expand Down Expand Up @@ -124,6 +126,66 @@ var _ = Describe("The OpenTelemetry Collector resource manager", Ordered, func()

VerifyCollectorResources(ctx, k8sClient, Dash0OperatorNamespace)
})

It("should delete outdated resources from older operator versions", func() {
nameOfOutdatedResources := fmt.Sprintf("%s-opentelemetry-collector-agent", NamePrefix)
Expect(k8sClient.Create(ctx, &corev1.ConfigMap{
ObjectMeta: metav1.ObjectMeta{
Name: nameOfOutdatedResources,
Namespace: Dash0OperatorNamespace,
},
})).To(Succeed())
Expect(k8sClient.Create(ctx, &appsv1.DaemonSet{
ObjectMeta: metav1.ObjectMeta{
Name: nameOfOutdatedResources,
Namespace: Dash0OperatorNamespace,
},
Spec: appsv1.DaemonSetSpec{
Selector: &metav1.LabelSelector{
MatchLabels: daemonSetMatchLabels,
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: daemonSetMatchLabels,
},
Spec: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: openTelemetryCollector,
Image: CollectorImageTest,
},
},
},
},
},
})).To(Succeed())

_, _, err :=
oTelColResourceManager.CreateOrUpdateOpenTelemetryCollectorResources(
ctx,
Dash0OperatorNamespace,
TestImages,
dash0MonitoringResource,
selfmonitoring.SelfMonitoringConfiguration{},
&logger,
)
Expect(err).ToNot(HaveOccurred())

VerifyResourceDoesNotExist(
ctx,
k8sClient,
Dash0OperatorNamespace,
nameOfOutdatedResources,
&corev1.ConfigMap{},
)
VerifyResourceDoesNotExist(
ctx,
k8sClient,
Dash0OperatorNamespace,
nameOfOutdatedResources,
&appsv1.DaemonSet{},
)
})
})

Describe("when OpenTelemetry collector resources have been modified externally", func() {
Expand Down
4 changes: 4 additions & 0 deletions test-resources/bin/test-cleanup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,14 @@ kubectl delete --ignore-not-found=true customresourcedefinition dash0operatorcon

# The following resources are deleted automatically with helm uninstall, unless for example when the operator manager
# crashes and the helm pre-delete helm hook cannot run, then they might be left behind.
kubectl delete clusterrole --ignore-not-found dash0-operator-cluster-metrics-collector-cr
kubectl delete clusterrole --ignore-not-found dash0-operator-manager-role
kubectl delete clusterrole --ignore-not-found dash0-operator-metrics-reader
kubectl delete clusterrole --ignore-not-found dash0-operator-opentelemetry-collector-cr
kubectl delete clusterrole --ignore-not-found dash0-operator-proxy-role
kubectl delete clusterrolebinding --ignore-not-found dash0-operator-cluster-metrics-collector-crb
kubectl delete clusterrolebinding --ignore-not-found dash0-operator-manager-rolebinding
kubectl delete clusterrolebinding --ignore-not-found dash0-operator-opentelemetry-collector-crb
kubectl delete clusterrolebinding --ignore-not-found dash0-operator-proxy-rolebinding
kubectl delete mutatingwebhookconfiguration --ignore-not-found dash0-operator-injector

3 changes: 3 additions & 0 deletions test-resources/bin/util
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ build_all_images() {

deploy_via_helm() {
helm_install_command="helm install --namespace dash0-system"
if [[ -n "${OPERATOR_HELM_CHART_VERSION:-}" ]]; then
helm_install_command+=" --version $OPERATOR_HELM_CHART_VERSION"
fi
helm_install_command+=" --set operator.developmentMode=true"
if ! has_been_set_to_empty_string "CONTROLLER_IMG_REPOSITORY"; then
helm_install_command+=" --set operator.image.repository=${CONTROLLER_IMG_REPOSITORY:-operator-controller}"
Expand Down
4 changes: 2 additions & 2 deletions test/util/collector.go
Original file line number Diff line number Diff line change
Expand Up @@ -300,7 +300,7 @@ func VerifyCollectorResourcesDoNotExist(
if expectedRes.clusterScoped {
expectedNamespace = ""
}
verifyResourceDoesNotExist(
VerifyResourceDoesNotExist(
ctx,
k8sClient,
expectedNamespace,
Expand All @@ -310,7 +310,7 @@ func VerifyCollectorResourcesDoNotExist(
}
}

func verifyResourceDoesNotExist(
func VerifyResourceDoesNotExist(
ctx context.Context,
k8sClient client.Client,
namespace string,
Expand Down

0 comments on commit ceec292

Please sign in to comment.