Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bitnami/rabbitmq] Cluster on Kubernetes Failing with one node go down #32162

Open
JalisDiehl opened this issue Feb 25, 2025 · 2 comments
Open
Assignees
Labels
rabbitmq tech-issues The user has a technical issue about an application triage Triage is needed

Comments

@JalisDiehl
Copy link

Name and Version

3.17.3

What architecture are you using?

amd64

What steps will reproduce the bug?

We are using Rabbitmq on Kubernetes Cluster, we started with spot instances and 3 pods and pvcs, when one node go down the cluster seems to stop

Are you using any custom parameters or values?

  valuesInline:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions: 
              - key: "karpenter.sh/capacity-type"
                operator: "In"
                values: ["on-demand"]  
    nodeSelector: 
      karpenter.sh/capacity-type: on-demand
    clustering:
      forceBoot: true
    podManagementPolicy: "Parallel"
    resources:
      requests:
        cpu: 100m
        memory: 812Mi
      limits:
        cpu: 2
        memory: 3072Mi    
    replicaCount: 3
    persistence:
      size: 40Gi
    auth:
      username: xpto
      password: bar
      erlangCookie: foo
      tls: 
        enabled: true
        failIfNoPeerCert: false
        existingSecret: "certtls"
        existingSecretFullChain: true
        sslOptionsVerify: "verify_none"
    service:
      type: LoadBalancer
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-type: nlb
        service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
        external-dns.alpha.kubernetes.io/hostname: dns-value
        service.beta.kubernetes.io/aws-load-balancer-ssl-cert: ""
        service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "5671"
        prometheus.io/scrape: "true"
        prometheus.io/port: "9419"
        prometheus.io/path: "/metrics/per-object"
      epmdPortEnabled: false
      distPortEnabled: false
    ingress:
      enabled: true
      hostname: xpto
      path: /
      tls: true
      ingressClassName: nginx
      annotations:
        cert-manager.io/cluster-issuer: letsencrypt

What is the expected behavior?

Not stop work

What do you see instead?

Rabbit down

Additional information

Image

@JalisDiehl JalisDiehl added the tech-issues The user has a technical issue about an application label Feb 25, 2025
@github-actions github-actions bot added the triage Triage is needed label Feb 25, 2025
@javsalgar javsalgar changed the title Rabbitmq Cluster on Kubernetes Failing with one node go down [bitnami/rabbbitmq] Cluster on Kubernetes Failing with one node go down Feb 26, 2025
@javsalgar javsalgar changed the title [bitnami/rabbbitmq] Cluster on Kubernetes Failing with one node go down [bitnami/rabbitmq] Cluster on Kubernetes Failing with one node go down Feb 26, 2025
@javsalgar
Copy link
Contributor

Hi,

Could you share the logs of the instances?

@JalisDiehl
Copy link
Author

There are no logs, only pods try to start again...
But when happen again, I will try to collect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rabbitmq tech-issues The user has a technical issue about an application triage Triage is needed
Projects
None yet
Development

No branches or pull requests

2 participants