Skip to content

Commit

Permalink
feat(chart): configurable tolerations for daemonset collector pods
Browse files Browse the repository at this point in the history
This allows to schedule daemonset collector pods to be scheduled on
nodes that have a taint.

Also: Move Helm settings for limits/request for collector containers:
- operator.collectorDaemonSetCollectorContainerResources ->
  operator.collectors.daemonSetCollectorContainerResources
- operator.collectorDaemonSetConfigurationReloaderContainerResources ->
  operator.collectors.daemonSetConfigurationReloaderContainerResources
- operator.collectorDaemonSetFileLogOffsetSynchContainerResources ->
  operator.collectors.daemonSetFileLogOffsetSynchContainerResources
- operator.collectorDeploymentCollectorContainerResources ->
  operator.collectors.deploymentCollectorContainerResources
- operator.collectorDeploymentConfigurationReloaderContainerResources ->
  operator.collectors.deploymentConfigurationReloaderContainerResources

The previous locations for these settings still work as well though for
backwards compatibility.
  • Loading branch information
basti1302 committed Feb 12, 2025
1 parent c093e0e commit eaef51b
Show file tree
Hide file tree
Showing 22 changed files with 861 additions and 501 deletions.
14 changes: 7 additions & 7 deletions cmd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ const (
debugVerbosityDetailedEnvVarName = "OTEL_COLLECTOR_DEBUG_VERBOSITY_DETAILED"
sendBatchMaxSizeEnvVarName = "OTEL_COLLECTOR_SEND_BATCH_MAX_SIZE"

oTelColResourceSpecConfigFile = "/etc/config/otelcolresources.yaml"
oTelColExtraConfigFile = "/etc/config/otelcolextra.yaml"

//nolint
mandatoryEnvVarMissingMessageTemplate = "cannot start the Dash0 operator, the mandatory environment variable \"%s\" is missing"
Expand Down Expand Up @@ -558,12 +558,12 @@ func readEnvironmentVariables(logger *logr.Logger) error {
return nil
}

func readConfiguration() (*otelcolresources.OTelColResourceSpecs, error) {
oTelColResourceSpec, err := otelcolresources.ReadOTelColResourcesConfiguration(oTelColResourceSpecConfigFile)
func readOTelColExtraConfiguration() (*otelcolresources.OTelColExtraConfig, error) {
oTelColExtraSpec, err := otelcolresources.ReadOTelColExtraConfiguration(oTelColExtraConfigFile)
if err != nil {
return nil, fmt.Errorf("Cannot read configuration file %s: %w", oTelColResourceSpecConfigFile, err)
return nil, fmt.Errorf("Cannot read configuration file %s: %w", oTelColExtraConfigFile, err)
}
return oTelColResourceSpec, nil
return oTelColExtraSpec, nil
}

func readOptionalPullPolicyFromEnvironmentVariable(envVarName string) corev1.PullPolicy {
Expand All @@ -589,7 +589,7 @@ func startDash0Controllers(
operatorConfiguration *startup.OperatorConfigurationValues,
developmentMode bool,
) error {
oTelColResourceSpecs, err := readConfiguration()
oTelColExtraConfig, err := readOTelColExtraConfiguration()
if err != nil {
os.Exit(1)
}
Expand Down Expand Up @@ -639,7 +639,7 @@ func startDash0Controllers(
Scheme: mgr.GetScheme(),
DeploymentSelfReference: deploymentSelfReference,
OTelCollectorNamePrefix: envVars.oTelCollectorNamePrefix,
OTelColResourceSpecs: oTelColResourceSpecs,
OTelColExtraConfig: oTelColExtraConfig,
SendBatchMaxSize: envVars.sendBatchMaxSize,
IsIPv6Cluster: isIPv6Cluster,
IsDocker: isDocker,
Expand Down
34 changes: 30 additions & 4 deletions helm-chart/dash0-operator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -494,15 +494,41 @@ By default, the operator collects metrics as follows:

Disabling or enabling individual metrics via configuration is not supported.

### Preventing Operator Scheduling on Specific Nodes
### Controlling On Which Nodes the Operator's Collector Pods Are Scheduled

#### Allow Scheduling on Tainted Nodes

The operator uses a Kubernetes daemonset to deploy the OpenTelemetry collector on each node; to collect telemetry from
that node and workloads running on that node.
If you use [taints](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) on certain nodes,
Kubernetes will not schedule any pods there, preventing the daemonset collector pods to be present on these nodes.
You can allow the daemonset collector pods to be scheduled there by configuring tolerations matching your taints for the
collector pods.
Tolerations can be configured as follows:
```
operator:
collectors:
daemonSetTolerations:
- key: key1
operator: Equal
value: value1
effect: NoSchedule
- key: key2
operator: Exists
effect: NoSchedule
```

Note: The tolerations will be added to the daemonset collector pods, but not to the deployment collector pod.

#### Preventing Operator Scheduling on Specific Nodes

All the pods deployed by the operator have a default node anti-affinity for the `dash0.com/enable=false` node label.
That is, if you add the `dash0.com/enable=false` label to a node, none of the pods owned by the operator will schedule
on that node.
That is, if you add the `dash0.com/enable=false` label to a node, none of the pods owned by the operator will be
scheduled on that node.

**IMPORTANT:** This includes the daemonset that the operator will set up to receive telemetry from the pods, which might
leads to situations in which instrumented pods cannot send telemetry because the local node does not have a daemonset
pod.
collector pod.
In other words, if you want to monitor workloads with the Dash0 operator and use the `dash0.com/enable=false` node
anti-affinity, make sure that the workloads you want to monitor have the same anti-affinity:

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ template "dash0-operator.collectorResourceConfigMapName" . }}
namespace: {{ .Release.Namespace }}
labels:
app.kubernetes.io/name: dash0-operator
app.kubernetes.io/component: controller
app.kubernetes.io/instance: collector-extra-config-map
{{- include "dash0-operator.labels" . | nindent 4 }}

{{/*
Note: Until version 0.45.1, the collector resource values were supposed to be defined at
operator.collectorDaemonSetCollectorContainerResources; we still allow these value for backwards
compatibility.
*/}}
data:
otelcolextra.yaml: |-
collectorDaemonSetCollectorContainerResources:
{{- toYaml (
default
.Values.operator.collectors.daemonSetCollectorContainerResources
.Values.operator.collectorDaemonSetCollectorContainerResources
) | nindent 6 }}
collectorDaemonSetConfigurationReloaderContainerResources:
{{- toYaml (
default
.Values.operator.collectors.daemonSetConfigurationReloaderContainerResources
.Values.operator.collectorDaemonSetConfigurationReloaderContainerResources
) | nindent 6 }}
collectorDaemonSetFileLogOffsetSynchContainerResources:
{{- toYaml (
default
.Values.operator.collectors.daemonSetFileLogOffsetSynchContainerResources
.Values.operator.collectorDaemonSetFileLogOffsetSynchContainerResources
) | nindent 6 }}
collectorDeploymentCollectorContainerResources:
{{- toYaml (
default
.Values.operator.collectors.deploymentCollectorContainerResources
.Values.operator.collectorDeploymentCollectorContainerResources
) | nindent 6 }}
collectorDeploymentConfigurationReloaderContainerResources:
{{- toYaml (
default
.Values.operator.collectors.deploymentConfigurationReloaderContainerResources
.Values.operator.collectorDeploymentConfigurationReloaderContainerResources
) | nindent 6 }}
daemonSetTolerations:
{{- toYaml .Values.operator.collectors.daemonSetTolerations | nindent 6 }}

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
otelcol resources config map should match snapshot:
otelcol extra config map should match snapshot:
1: |
apiVersion: v1
data:
otelcolresources.yaml: |-
otelcolextra.yaml: |-
collectorDaemonSetCollectorContainerResources:
gomemlimit: 400MiB
limits:
Expand All @@ -21,7 +21,6 @@ otelcol resources config map should match snapshot:
memory: 32Mi
requests:
memory: 32Mi

collectorDeploymentCollectorContainerResources:
gomemlimit: 400MiB
limits:
Expand All @@ -34,11 +33,13 @@ otelcol resources config map should match snapshot:
memory: 12Mi
requests:
memory: 12Mi
daemonSetTolerations:
[]
kind: ConfigMap
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: collector-resources-config-map
app.kubernetes.io/instance: collector-extra-config-map
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: dash0-operator
app.kubernetes.io/part-of: dash0-operator
Expand Down
Loading

0 comments on commit eaef51b

Please sign in to comment.