Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operator constanly tries to reconcile fluentbitagent because it tries to update daemonset labels #1837

Open
siimaus opened this issue Oct 31, 2024 · 8 comments
Labels
bug Something isn't working
Milestone

Comments

@siimaus
Copy link

siimaus commented Oct 31, 2024

Logging operator constantly reports following error:

DaemonSet.apps "rancher-logging-eks-fluentbit" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"rancher-logging-eks", "app.kubernetes.io/managed-by":"rancher-logging-eks", "app.kubernetes.io/name":"fluentbit"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

When I enable workloads recreation as mentioned in logs by setting logging objects
spec.enableRecreateWorkloadOnImmutableFieldChange: true
then operator constantly recreates fluentbit agent daemonsets either based on some schedule or when operator is restarted etc. Deletion of CRD and recreating anew does not solve issue.

Error seems to stem from attempt to change daemonset selector but I am not familiar enough with operator code to say why it tries to change selector.

Describe the bug:
Operator constantly tries to recreate fluentbit daemonsets because it fails reconcilation
Expected behaviour:
No attempts at recreation of daemonsets unless FluentbitAgent CRD has changed.

Steps to reproduce the bug:

  1. Install Rancher control plane (v2.9.0) on EKS
  2. Add managed EKS cluster
  3. Install Rancher Logging via Rancher apps.

Additional context:
Add any other context about the problem here.

Environment details:

{"level":"error","ts":"2024-10-31T13:03:37Z","msg":"Reconciler error","controller":"logging","controllerGroup":"logging.banzaicloud.io","controllerKind":"Logging","Logging":{"name":"rancher-logging-eks"},"namespace":"","name":"rancher-logging-eks","reconcileID":"158a57af-91ab-4485-a2dc-055889fc2e0e","error":"failed to reconcile resource: Object has to be recreated, but refusing to remove without explicitly being told so. Use logging.spec.enableRecreateWorkloadOnImmutableFieldChange to move on but make sure to understand the consequences. As of fluentd, to avoid data loss, make sure to use a persistent volume for buffers, which is the default, unless explicitly disabled or configured differently. As of fluent-bit, to avoid duplicated logs, make sure to configure a hostPath volume for the positions through logging.spec.fluentbit.spec.positiondb. : DaemonSet.apps "rancher-logging-eks-fluentbit" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"rancher-logging-eks", "app.kubernetes.io/managed-by":"rancher-logging-eks", "app.kubernetes.io/name":"fluentbit"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable","errorVerbose":"DaemonSet.apps "rancher-logging-eks-fluentbit" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"rancher-logging-eks", "app.kubernetes.io/managed-by":"rancher-logging-eks", "app.kubernetes.io/name":"fluentbit"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable\nObject has to be recreated, but refusing to remove without explicitly being told so. Use logging.spec.enableRecreateWorkloadOnImmutableFieldChange to move on but make sure to understand the consequences. As of fluentd, to avoid data loss, make sure to use a persistent volume for buffers, which is the default, unless explicitly disabled or configured differently. As of fluent-bit, to avoid duplicated logs, make sure to configure a hostPath volume for the positions through logging.spec.fluentbit.spec.positiondb. \ngithub.com/cisco-open/operator-tools/pkg/reconciler.(*GenericResourceReconciler).ReconcileResource\n\t/go/pkg/mod/github.com/cisco-open/[email protected]/pkg/reconciler/resource.go:515\ngithub.com/kube-logging/logging-operator/pkg/resources/fluentbit.(*Reconciler).Reconcile\n\t/usr/local/src/logging-operator/pkg/resources/fluentbit/fluentbit.go:149\ngithub.com/kube-logging/logging-operator/controllers/logging.(*LoggingReconciler).Reconcile\n\t/usr/local/src/logging-operator/controllers/logging/logging_controller.go:280\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695\nfailed to reconcile resource\ngithub.com/kube-logging/logging-operator/pkg/resources/fluentbit.(*Reconciler).Reconcile\n\t/usr/local/src/logging-operator/pkg/resources/fluentbit/fluentbit.go:151\ngithub.com/kube-logging/logging-operator/controllers/logging.(*LoggingReconciler).Reconcile\n\t/usr/local/src/logging-operator/controllers/logging/logging_controller.go:280\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"}

  • Resource definition (possibly in YAML format) that caused the issue, without sensitive data:
apiVersion: logging.banzaicloud.io/v1beta1
kind: FluentbitAgent
metadata:
  annotations:
    meta.helm.sh/release-name: rancher-logging
    meta.helm.sh/release-namespace: cattle-logging-system
  labels:
    app.kubernetes.io/instance: rancher-logging
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: rancher-logging
    app.kubernetes.io/version: 4.8.0
    helm.sh/chart: rancher-logging-104.1.1_up4.8.0
  name: rancher-logging-eks

spec:
  disableKubernetesFilter: true
  extraVolumeMounts:
    - destination: /var/log/messages
      readOnly: true
      source: /var/log/messages
  image:
    repository: rancher/mirrored-fluent-fluent-bit
    tag: 2.2.0
  inputTail:
    Buffer_Chunk_Size: 1MB
    Buffer_Max_Size: 5MB
    Parser: syslog
    Path: /var/log/messages
    Tag: eks
  nodeSelector:
    kubernetes.io/os: linux
  podPriorityClassName: system-cluster-critical
  tolerations:
    - operator: Exists

/kind bug

@siimaus siimaus added the bug Something isn't working label Oct 31, 2024
@pepov pepov added this to the 5.0 milestone Nov 25, 2024
@pepov pepov added the triage label Nov 25, 2024
@pepov
Copy link
Member

pepov commented Nov 25, 2024

@siimaus thanks for the report and sorry for the long delay! Have you tried to set loggins.spec. enableRecreateWorkloadOnImmutableFieldChange = true? This flag is required for the operator to let it recreate the agent daemonset in case there is a change.

@pepov pepov removed the triage label Nov 25, 2024
@pepov pepov removed this from the 5.0 milestone Nov 25, 2024
@csatib02 csatib02 modified the milestones: Fluentd, 5.x Nov 29, 2024
@pepov
Copy link
Member

pepov commented Dec 2, 2024

@siimaus sorry I was mixing things up. Can you please provide your logging resource as well? If you let the resource to be recreated what is the difference between the original and the recreated resource exactly?

@jbiers
Copy link

jbiers commented Dec 17, 2024

@pepov I'm also seeing this happening when I have multiple loggings, and consequently multiple fluentbitagents.

If I enable spec.enableRecreateWorkloadOnImmutableFieldChange it recreates the Daemonset with a different spec.selector, specifically setting selector.Matchlabels["app.kubernetes.io/managed-by:"] to the other FluentBitAgent, not the one which is actually responsible for this Daemonset.

@pepov
Copy link
Member

pepov commented Dec 17, 2024

Do you use loggingRef to isolate your fluentbitagents? Can you describe your setup a bit more in general?

@maage
Copy link

maage commented Jan 23, 2025

Bug here is that there is multiple implementations how loggingRef is gotten for comparisons in go code, or in k8s via selector, or in actual logic like hostnames.

In some cases value is from loggingRef value, name, empty or default.
You just need to set loggingRef for all components you use and support it, for example FluentbitAgent, Logging, ClusterOutput, ClusterFlow, to fix this issue.

These use logginRef in both directions,

func reconcileRequestsForLoggingRef(loggings []loggingv1beta1.Logging, loggingRef string) (reqs []reconcile.Request) {
for _, l := range loggings {
if l.Spec.LoggingRef == loggingRef {
reqs = append(reqs, reconcile.Request{

and
case *loggingv1beta1.ClusterOutput:
return reconcileRequestsForLoggingRef(loggingList.Items, o.Spec.LoggingRef)
case *loggingv1beta1.Output:
return reconcileRequestsForLoggingRef(loggingList.Items, o.Spec.LoggingRef)
case *loggingv1beta1.Flow:
return reconcileRequestsForLoggingRef(loggingList.Items, o.Spec.LoggingRef)
case *loggingv1beta1.ClusterFlow:
return reconcileRequestsForLoggingRef(loggingList.Items, o.Spec.LoggingRef)
case *loggingv1beta1.SyslogNGClusterOutput:
return reconcileRequestsForLoggingRef(loggingList.Items, o.Spec.LoggingRef)
case *loggingv1beta1.SyslogNGOutput:
return reconcileRequestsForLoggingRef(loggingList.Items, o.Spec.LoggingRef)
case *loggingv1beta1.SyslogNGClusterFlow:
return reconcileRequestsForLoggingRef(loggingList.Items, o.Spec.LoggingRef)
case *loggingv1beta1.SyslogNGFlow:
return reconcileRequestsForLoggingRef(loggingList.Items, o.Spec.LoggingRef)
case *loggingv1beta1.NodeAgent:
return reconcileRequestsForLoggingRef(loggingList.Items, o.Spec.LoggingRef)
case *loggingv1beta1.FluentbitAgent:
return reconcileRequestsForLoggingRef(loggingList.Items, o.Spec.LoggingRef)

But secrets have their own implementation,

case *corev1.Secret:
r := regexp.MustCompile(`^logging\.banzaicloud\.io/(.*)`)
var requestList []reconcile.Request
for key := range o.Annotations {
if result := r.FindStringSubmatch(key); len(result) > 1 {
loggingRef := result[1]
// When loggingRef is "default" we also trigger for the empty ("") loggingRef as well, because the empty string cannot be used in the annotation, thus "default" refers to the empty case.
if loggingRef == "default" {
requestList = append(requestList, reconcileRequestsForLoggingRef(loggingList.Items, "")...)
}
requestList = append(requestList, reconcileRequestsForLoggingRef(loggingList.Items, loggingRef)...)
}
}
return requestList

ClusterFlow and ClusterOutput use loggingRef,

func (r LoggingResourceRepository) ClusterFlowsFor(ctx context.Context, logging v1beta1.Logging) ([]v1beta1.ClusterFlow, error) {
var list v1beta1.ClusterFlowList
if err := r.Client.List(ctx, &list, clusterResourceListOpts(logging)...); err != nil {
return nil, err
}
var res []v1beta1.ClusterFlow
for _, i := range list.Items {
if i.Spec.LoggingRef == logging.Spec.LoggingRef {

func (r LoggingResourceRepository) ClusterOutputsFor(ctx context.Context, logging v1beta1.Logging) ([]v1beta1.ClusterOutput, error) {
var list v1beta1.ClusterOutputList
if err := r.Client.List(ctx, &list, clusterResourceListOpts(logging)...); err != nil {
return nil, err
}
var res []v1beta1.ClusterOutput
for _, i := range list.Items {
if i.Spec.LoggingRef == logging.Spec.LoggingRef {

As does FluentbitAgent,

func (r LoggingResourceRepository) FluentbitsFor(ctx context.Context, logging v1beta1.Logging) ([]v1beta1.FluentbitAgent, error) {
var list v1beta1.FluentbitAgentList
if err := r.Client.List(ctx, &list); err != nil {
return nil, err
}
var res []v1beta1.FluentbitAgent
for _, i := range list.Items {
if i.Spec.LoggingRef == logging.Spec.LoggingRef {

LoggingRoutes use source and loggingRef

func (r LoggingResourceRepository) LoggingRoutesFor(ctx context.Context, logging v1beta1.Logging) ([]v1beta1.LoggingRoute, error) {
var list v1beta1.LoggingRouteList
if err := r.Client.List(ctx, &list); err != nil {
return nil, err
}
var res []v1beta1.LoggingRoute
for _, i := range list.Items {
if i.Spec.Source == logging.Spec.LoggingRef {

Fluentd uses name instead of loggingRef

func (l *Logging) GetFluentdLabels(component string, f FluentdSpec) map[string]string {
return util.MergeLabels(
f.Labels,
map[string]string{
"app.kubernetes.io/name": "fluentd",
"app.kubernetes.io/component": component,
},
GenerateLoggingRefLabels(l.ObjectMeta.GetName()),

func GenerateLoggingRefLabels(loggingRef string) map[string]string {
return map[string]string{"app.kubernetes.io/managed-by": loggingRef}

ServiceMonitor uses name,

return &v1.ServiceMonitor{
ObjectMeta: objectMetadata,
Spec: v1.ServiceMonitorSpec{
JobLabel: "",
TargetLabels: nil,
PodTargetLabels: nil,
Endpoints: []v1.Endpoint{{
Port: "http-metrics",
Path: r.fluentbitSpec.Metrics.Path,
HonorLabels: r.fluentbitSpec.Metrics.ServiceMonitorConfig.HonorLabels,
RelabelConfigs: r.fluentbitSpec.Metrics.ServiceMonitorConfig.Relabelings,
MetricRelabelConfigs: r.fluentbitSpec.Metrics.ServiceMonitorConfig.MetricsRelabelings,
Scheme: r.fluentbitSpec.Metrics.ServiceMonitorConfig.Scheme,
TLSConfig: r.fluentbitSpec.Metrics.ServiceMonitorConfig.TLSConfig,
}},
Selector: v12.LabelSelector{
MatchLabels: util.MergeLabels(r.fluentbitSpec.Labels, r.getFluentBitLabels(), generateLoggingRefLabels(r.Logging.GetName())),

Secrets use loggingRef or "default"

var loggingRef string
if r.Logging.Spec.LoggingRef != "" {
loggingRef = r.Logging.Spec.LoggingRef
} else {
loggingRef = "default"

Main uses loggingRef if set

logging-operator/main.go

Lines 333 to 334 in c6bf514

if loggingRef != "" {
labelSelector = labels.Set{"app.kubernetes.io/managed-by": loggingRef}.AsSelector()

Fluentbit uses name

func generateLoggingRefLabels(loggingRef string) map[string]string {
return map[string]string{"app.kubernetes.io/managed-by": loggingRef}
}
func (r *Reconciler) getFluentBitLabels() map[string]string {
return util.MergeLabels(
r.fluentbitSpec.Labels,
map[string]string{
"app.kubernetes.io/instance": r.nameProvider.Name(),
"app.kubernetes.io/name": "fluentbit",
},
generateLoggingRefLabels(r.Logging.GetName()))

@pepov
Copy link
Member

pepov commented Feb 3, 2025

Yes, unfortunately this has been handled differently throughout the code and haven't been fixed since. I think it will be non-trivial to fix in a compatible manner. For now can someone give as a reproducible example?

@skanakal
Copy link

@pepov having loggingRef in values.yaml works for me...

logging:

  # -- Logging resources are disabled by default
  enabled: false

  # -- Reference to the logging system. Each of the loggingRefs can manage a fluentbit daemonset and a fluentd statefulset.
  loggingRef: "xxxxxxx"

@pszczypta-autopay
Copy link

I had the same problem. Changin spec.enableRecreateWorkloadOnImmutableFieldChange: true solved the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants