Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector sending queue value not getting reflected #37445

Open
adithya-r-nathan opened this issue Jan 23, 2025 · 4 comments
Open

Collector sending queue value not getting reflected #37445

adithya-r-nathan opened this issue Jan 23, 2025 · 4 comments
Labels
bug Something isn't working exporter/clickhouse needs triage New item requiring triage

Comments

@adithya-r-nathan
Copy link

adithya-r-nathan commented Jan 23, 2025

Component(s)

exporter/clickhouse

What happened?

Description

The sending_queue.queue_size configuration was set to 10000 for the OpenTelemetry Collector’s ClickHouse exporter. However, when checking the metrics, the queue size appears to be set to the default value of 1000. This discrepancy suggests that the configured value is not being applied as expected.
• Operator Version: 0.68.1
• Collector Image: otel/opentelemetry-collector-contrib:0.109.0

Steps to Reproduce

Steps to Reproduce
1. Deploy an OpenTelemetry Collector using the following configuration:
exporters: clickhouse: endpoint: tcp://{{.Values.endpoints }} username: {{.Values.db_user | quote }} database: {{.Values.database }} password: {{.Values.db_pw | quote }} logs_table_name: {{ .Values.logtable }} traces_table_name: otel_traces create_schema: true ttl: 12h timeout: 10s sending_queue: queue_size: 10000 retry_on_failure: enabled: true initial_interval: 5s max_interval: 30s max_elapsed_time: 300s
2. Verify that the deployment uses:
• OpenTelemetry Operator: 0.68.1
• Collector Image: otel/opentelemetry-collector-contrib:0.109.0
3. Monitor the metrics using the /metrics endpoint or Prometheus.
4. Look for the exporter_send_queue_size metric.

Expected Result

The exporter_send_queue_size metric should reflect the configured value of 10000.

Actual Result

The exporter_send_queue_size metric shows the default value of 1000, indicating that the configuration was not applied.

Collector version

v0.109.0

Environment information

Environment

OS: AWS EKS (Managed Kubernetes Service)
Cluster Version: 1.30

OpenTelemetry Collector configuration

exporters:
      clickhouse:
        endpoint: tcp://{{.Values.endpoints }}
        username: {{.Values.db_user | quote }}
        database: {{.Values.database }}
        password: {{.Values.db_pw | quote }}
        logs_table_name:  {{ .Values.logtable }}
        traces_table_name: otel_traces
        create_schema: true
        ttl: 12h
        timeout: 10s
        sending_queue:
          queue_size: 10000
        retry_on_failure:
          enabled: true
          initial_interval: 5s
          max_interval: 30s
          max_elapsed_time: 300s

Log output

{
    "metric": {
      "__name__": "otelcol_exporter_queue_capacity",
      "app_kubernetes_io_component": "opentelemetry-collector",
      "app_kubernetes_io_instance": "otel-agent.otel-log-agent",
      "app_kubernetes_io_managed_by": "opentelemetry-operator",
      "app_kubernetes_io_name": "otel-log-agent-collector",
      "app_kubernetes_io_part_of": "opentelemetry",
      "app_kubernetes_io_version": "0.109.0",
      "argocd_argoproj_io_instance": "ms-prd-ff-otel-agent",
      "cluster": "ms-prd",
      "controller_revision_hash": "7fcbf5ff45",
      "eks_cluster": "ms-prd",
      "exporter": "otlp",
      "instance": "100.0.1.233:8888",
      "job": "kubernetes-pods",
      "kubernetes_namespace": "otel-agent",
      "kubernetes_pod_name": "otel-log-agent-collector-bmrl8",
      "pod_template_generation": "3",
      "prometheus": "vm-agent/vm-stack",
      "service_instance_id": "d743e776-a212-49fe-abc7-1733161f9ac3",
      "service_name": "otelcol-contrib",
      "service_version": "0.109.0"
    },
    "value": [
      1737646453.21,
      "1000"
    ],
    "group": 1
  }

Additional context

Image

In argocd I am able to see the configuration as 10000 but in metrics it is 1000(default value).

@adithya-r-nathan adithya-r-nathan added bug Something isn't working needs triage New item requiring triage labels Jan 23, 2025
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@SpencerTorres
Copy link
Member

Hey! Thanks for submitting an issue, this info is helpful

The most recent change to this area that I can recall is #34176, but this simply follows the defaults.

It calls this which sets the current default to 1_000, and it is parsed under the correct field called queue_size.

It all seems to be correct, I wonder if it's reporting it incorrectly to the metrics somehow? We can run it through a debugger to find out. I see there's also a field called enabled that is set to true by default, you can try adding it specifically to make sure it's not getting disabled at any point in the config parsing process.

@adithya-r-nathan
Copy link
Author

Hi @SpencerTorres, Thank you for your response let me try that and get back to you.

@adithya-r-nathan
Copy link
Author

adithya-r-nathan commented Jan 24, 2025

Hi @SpencerTorres, tried by keeping the enabled : true still it is not getting reflected any other way to increase the queue or any suggestions.
Below is the configmap from the argocd
apiVersion: v1 data: collector.yaml: | receivers: otlp: protocols: grpc: auth: authenticator: basicauth/server endpoint: 0.0.0.0:4317 http: auth: authenticator: basicauth/server endpoint: 0.0.0.0:4318 exporters: clickhouse: create_schema: false database: OTEL endpoint: tcp://k8s-vizhi-clickhou-bfcce.elb.ap-south-1.amazonaws.com:9000 logs_table_name: otel_logs_v5 password: 7d********** retry_on_failure: enabled: true initial_interval: 5s max_elapsed_time: 300s max_interval: 30s sending_queue: enabled: true queue_size: 10000 timeout: 10s ttl: 12h username: adminuser debug: verbosity: detailed processors: batch: {} memory_limiter: check_interval: 5s limit_percentage: 85 spike_limit_percentage: 25 routing: default_exporters: - clickhouse error_mode: ignore table: - exporters: - clickhouse statement: route() where IsMatch(resource.attributes["index"], ".*") extensions: basicauth/server: htpasswd: inline: | username1:password1 health_check: endpoint: ${env:MY_POD_IP}:13133 service: extensions: - basicauth/server - health_check telemetry: metrics: address: 0.0.0.0:8888 pipelines: logs: exporters: - debug - clickhouse processors: - routing - batch - memory_limiter receivers: - otlp kind: ConfigMap metadata: creationTimestamp: '2025-01-24T12:32:19Z' labels: app.kubernetes.io/component: opentelemetry-collector app.kubernetes.io/instance: vizhi.otel-collector app.kubernetes.io/managed-by: opentelemetry-operator app.kubernetes.io/name: otel-collector-collector app.kubernetes.io/part-of: opentelemetry app.kubernetes.io/version: latest argocd.argoproj.io/instance: nonprod-m2p-tools-otel-collector name: otel-collector-collector-5f19bd5a namespace: vizhi ownerReferences: - apiVersion: opentelemetry.io/v1beta1 blockOwnerDeletion: true controller: true kind: OpenTelemetryCollector name: otel-collector uid: 09c7ef15-3d31-47f1-8a50-da3a549f7bb6 resourceVersion: '2612681762' uid: 431f388f-369e-4c11-a526-ac75d48f55b5

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/clickhouse needs triage New item requiring triage
Projects
None yet
Development

No branches or pull requests

2 participants