Otel collector not scrapping metrics properly #37281

gaur-piyush · 2025-01-17T13:07:55Z

Component(s)

exporter/googlecloud

Describe the issue you're reporting

Hi Team,

We're using google cloud exporter with our Otel configuration. However, we have seen that Otel collector doesn't scrape all the metrics. Can you help us identify the problem? From the configuration endpoint, I couldn't find an issue.

Attached is the configuration currently in use.

Regards,
Piyush Gaur

mode: deployment
presets:
    logsCollection:
        enabled: false
        includeCollectorLogs: false
        storeCheckpoints: false
    hostMetrics:
        enabled: false
    kubernetesAttributes:
        enabled: false
    kubernetesEvents:
        enabled: false
    clusterMetrics:
        enabled: false
    kubeletMetrics:
        enabled: false
configMap:
    create: true
config:
    exporters:
        debug:
            verbosity: normal
        googlecloud:
            project: PORJECT_ID
            metric:
                prefix: custom.googleapis.com
            sending_queue:
                enabled: true
                queue_size: 20000
    extensions:
        health_check:
          endpoint: 0.0.0.0:13133
    processors:
        batch:
            send_batch_size: 8192
            send_batch_max_size: 10000
            timeout: 10s
        memory_limiter:
            check_interval: 5s
            limit_percentage: 80
            spike_limit_percentage: 30
        resourcedetection:
            detectors: [gcp]
            timeout: 10s
        filter/cv:
            metrics:
                include:
                    match_type: strict
                    metric_names:
                        - LIST_OF_METRICS
    receivers:
        prometheus:
            config:
                scrape_configs:
                    - job_name: test
                      metrics_path: /metrics
                      scrape_interval: 300s
                      kubernetes_sd_configs:
                      - role: endpoints
                        namespaces:
                          names:
                            - test
                      relabel_configs:
                        - source_labels: [__meta_kubernetes_endpoints_name]
                          action: keep
                          regex: cloud-volumes-infrastructure
                        - source_labels: [__meta_kubernetes_namespace]
                          action: replace
                          target_label: namespace
                        - source_labels: [__meta_kubernetes_service_name]
                          action: replace
                          target_label: service
                        - source_labels: [__meta_kubernetes_pod_node_name]
                          action: replace
                          target_label: node                  
    service:
        extensions:
            - health_check
        pipelines:
            metrics:
                receivers: [prometheus]
                processors: [filter/cv, resourcedetection]
                exporters: [googlecloud, debug]
command:
    name: otelcol-contrib
    extraArgs: []
serviceAccount:
    create: true
    annotations: {}
    name: "open-telemetry-sa"
clusterRole:
    create: true
    annotations: {}
    name: ""
    rules:
    - apiGroups:
      - "apps"
      - ""
      resources:
      - 'nodes'
      - 'nodes/proxy'
      - 'nodes/metrics'
      - 'services'
      - 'endpoints'
      - 'pods'
      - 'ingresses'
      - 'configmaps'
      verbs:
      - 'get'
      - 'list'
      - 'watch'
    - apiGroups:
      - extensions
      - networking.k8s.io
      resources:
      - ingresses/status
      - ingresses
      verbs:
      - get
      - list
      - watch
    - nonResourceURLs:
      - /metrics
      verbs:
       - 'get'
       - 'list'
       - 'watch'
    clusterRoleBinding:
        annotations: {}
        name: ""
podSecurityContext: {}
securityContext: {}
nodeSelector: {}
tolerations: []
affinity: {}
topologySpreadConstraints: []
priorityClassName: ""
extraEnvs: []
extraVolumes: []
extraVolumeMounts: []
ports:
    otlp:
        enabled: true
        containerPort: 4317
        servicePort: 4317
        hostPort: 4317
        protocol: TCP
    otlp-http:
        enabled: true
        containerPort: 4318
        servicePort: 4318
        hostPort: 4318
        protocol: TCP
    otlp-lb:
        enabled: true
        containerPort: 55681
        servicePort: 55681
        hostPort: 55681
        protocol: TCP
    metrics:
        enabled: true
        containerPort: 8888
        servicePort: 8888
        protocol: TCP
resources: {}
podAnnotations: {}
podLabels: {}
hostNetwork: false
dnsPolicy: ""
replicaCount: 3
revisionHistoryLimit: 10
annotations: {}
extraContainers: []
initContainers: []
lifecycleHooks: {}
service:
    type: ClusterIP
    annotations: {}
ingress:
    enabled: false
    additionalIngresses: []
podMonitor:
    enabled: false
    metricsEndpoints:
        - port: metrics
    extraLabels: {}
serviceMonitor:
    enabled: false
    metricsEndpoints:
        - port: metrics
    extraLabels: {}
podDisruptionBudget:
    enabled: false
autoscaling:
    enabled: false
    minReplicas: 1
    maxReplicas: 10
    behavior: {}
    targetCPUUtilizationPercentage: 80
rollout:
    rollingUpdate: {}
    strategy: RollingUpdate
prometheusRule:
    enabled: false
    groups: []
    defaultRules:
        enabled: false
    extraLabels: {}
statefulset:
    volumeClaimTemplates: []
    podManagementPolicy: "Parallel"
networkPolicy:
    enabled: false
    annotations: {}
    allowIngressFrom: []
    extraIngressRules: []
    egressRules: []

values.txt

github-actions · 2025-01-17T13:08:12Z

Pinging code owners:

exporter/googlecloud: @aabmass @dashpole @jsuereth @punya @psx95

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dashpole · 2025-01-17T15:55:25Z

When you say "Otel collector doesn't scrape all the metrics", what do you mean, specifically? Do you see errors in the logs? Are all metrics missing, or just some metrics (which ones?)

dashpole · 2025-01-17T16:34:30Z

Also note that we generally recommend using the googlemanagedprometheus exporter for metrics over the googlecloud exporter. It has lower pricing, and better support for promql.

gaur-piyush · 2025-01-20T06:17:30Z

@dashpole It is not scraping some of the metrics. Even for the missing metrics, we can see that it scrapped for a while and then stopped.

dashpole · 2025-01-21T17:39:41Z

Is the prometheus receiver failing to scrape the metric, or is the googlecloud exporter failing to export it? Do you see any errors in the logs?

gaur-piyush · 2025-01-22T07:33:01Z

@dashpole It is scraping metrics but a few of the metrics are getting dropped. I checked and found that it might be due to relabel config in our scrape config specification. I will update on this. Please keep this open for a while.

gaur-piyush · 2025-01-23T11:56:03Z

@dashpole We keep seeing this error in our Otel pod logs although there is no issue with the metrics now.

2025-01-23T11:52:10.317Z warn internal/transaction.go:111 Failed to scrape Prometheus endpoint {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1737633130315, "target_labels": "{__name__=\"up\", instance=\"100.73.178.42:15090\", job=\"\", namespace=\"\", node=\"\", service=\"\"}"}

I am unable to understand why we keep getting this.

dashpole · 2025-01-23T14:28:03Z

That means the prometheus receiver is unable to scrape one of the targets. You will need to enable debug logging in the collector to see the detailed error message

github-actions · 2025-01-23T14:28:27Z

Pinging code owners for receiver/prometheus: @Aneurysm9 @dashpole. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

gaur-piyush · 2025-01-23T15:42:27Z

@dashpole I have enabled detailed logging but I am not setting anything else for the error mentioned above. Am I missing something here?

dashpole · 2025-01-23T15:51:15Z

You did this, right?

service:
  telemetry:
    metrics:
      level: detailed

gaur-piyush · 2025-01-23T15:58:01Z

You did this, right?

service:
telemetry:
metrics:
level: detailed

Apologies, I did it for Otel, not here. Let me make the changes and see what's going on.

gaur-piyush · 2025-01-23T16:07:48Z

I have enabled the detailed for metrics telemetry but not seeing any detailed information for the above error.

service:
      telemetry:
        metrics:
          level: detailed

dashpole · 2025-01-23T16:21:30Z

Oh, my bad. I copy-pasted the wrong thing from https://opentelemetry.io/docs/collector/internal-telemetry/#configure-internal-logs. It should be:

service:
  telemetry:
    logs:
      level: debug

gaur-piyush · 2025-01-30T06:22:29Z

@dashpole We're good now. Due to logging, we can identify the root cause of the error. Thanks a ton.

gaur-piyush added the needs triage New item requiring triage label Jan 17, 2025

github-actions bot added the exporter/googlecloud label Jan 17, 2025

dashpole self-assigned this Jan 17, 2025

dashpole removed the needs triage New item requiring triage label Jan 17, 2025

dashpole mentioned this issue Jan 17, 2025

OpenTelemetry Collector issues with GCP exporter #37290

Closed

github-actions bot mentioned this issue Jan 21, 2025

Weekly Report: 2025-01-14 - 2025-01-21 #37358

Open

dashpole added receiver/prometheus Prometheus receiver question Further information is requested labels Jan 23, 2025

gaur-piyush closed this as completed Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Otel collector not scrapping metrics properly #37281

Otel collector not scrapping metrics properly #37281

gaur-piyush commented Jan 17, 2025 •

edited by dashpole

Loading

github-actions bot commented Jan 17, 2025

dashpole commented Jan 17, 2025

dashpole commented Jan 17, 2025

gaur-piyush commented Jan 20, 2025

dashpole commented Jan 21, 2025

gaur-piyush commented Jan 22, 2025

gaur-piyush commented Jan 23, 2025

dashpole commented Jan 23, 2025

github-actions bot commented Jan 23, 2025

gaur-piyush commented Jan 23, 2025

dashpole commented Jan 23, 2025

gaur-piyush commented Jan 23, 2025

gaur-piyush commented Jan 23, 2025 •

edited by dashpole

Loading

dashpole commented Jan 23, 2025

gaur-piyush commented Jan 30, 2025

Otel collector not scrapping metrics properly #37281

Otel collector not scrapping metrics properly #37281

Comments

gaur-piyush commented Jan 17, 2025 • edited by dashpole Loading

Component(s)

Describe the issue you're reporting

github-actions bot commented Jan 17, 2025

dashpole commented Jan 17, 2025

dashpole commented Jan 17, 2025

gaur-piyush commented Jan 20, 2025

dashpole commented Jan 21, 2025

gaur-piyush commented Jan 22, 2025

gaur-piyush commented Jan 23, 2025

dashpole commented Jan 23, 2025

github-actions bot commented Jan 23, 2025

gaur-piyush commented Jan 23, 2025

dashpole commented Jan 23, 2025

gaur-piyush commented Jan 23, 2025

gaur-piyush commented Jan 23, 2025 • edited by dashpole Loading

dashpole commented Jan 23, 2025

gaur-piyush commented Jan 30, 2025

gaur-piyush commented Jan 17, 2025 •

edited by dashpole

Loading

gaur-piyush commented Jan 23, 2025 •

edited by dashpole

Loading