Error in getting metrics & prometheus plugin after bumping to the 3.7.1 release #14160

rodolfobrunner · 2025-01-14T11:49:54Z

Is there an existing issue for this?

I have searched the existing issues

Kong version (`$ kong version`)

3.7.1 / 3.9.0

Current Behavior

I am having problems with metrics & prometheus plugin after bumping to the 3.7.1 release. (I already bumped Kong until 3.9.0 and the issue still persists)

I have the following entry in my logs:
[lua] prometheus.lua:1020: log_error(): Error getting 'request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"}': nil, client: 10.145.40.1, server: kong_status, request: "GET /metrics HTTP/1.1", host: "10.145.12.54:8100"

Interesting facts:

it is always the same service and that route has that error. In my case it is always the same two routes for the same bucket.
When we revert back to 3.6.1 and the problem goes away
After a few months, we bumped Kong to version 3.9.0 and the problem started happening again after a couple of hours for the same routes + buckets.
Goes away with a pod rotation but comes back after a while.

I already tried:

nginx_http_lua_shared_dict: 'prometheus_metrics 15m' Memory now stands at +-20%

One pod contains:

kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="50"} 1
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="80"} 1
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="100"} 1
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="250"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="400"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="700"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="1000"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="2000"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="5000"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="10000"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="30000"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="60000"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="+Inf"} 2

While another is missing the le "80"

kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="50"} 1
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="100"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="250"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="400"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="700"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="1000"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="2000"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="5000"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="10000"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="30000"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="60000"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="+Inf"} 3

We are running our Kong in AWS EKS, upgraded from 3.6.1

Expected Behavior

The bucket should not disappear, but if it does for any reason I would expect Kong to be able to recover from an inconsistent state. (maybe metric reset?)

Steps To Reproduce

No response

Anything else?

No response

The text was updated successfully, but these errors were encountered:

ProBrian · 2025-01-17T01:57:33Z

@rodolfobrunner Does this issue happen while using the same deployment as #14144? I'm trying to reproduce it.

brunomiguelsantos · 2025-01-20T10:29:59Z

@rodolfobrunner Does this issue happen while using the same deployment as #14144? I'm trying to reproduce it.

Hey @ProBrian, I am part of the same team as @rodolfobrunner. Yes, it's the same deployment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in getting metrics & prometheus plugin after bumping to the 3.7.1 release #14160

Error in getting metrics & prometheus plugin after bumping to the 3.7.1 release #14160

rodolfobrunner commented Jan 14, 2025

ProBrian commented Jan 17, 2025

brunomiguelsantos commented Jan 20, 2025

Error in getting metrics & prometheus plugin after bumping to the 3.7.1 release #14160

Error in getting metrics & prometheus plugin after bumping to the 3.7.1 release #14160

Comments

rodolfobrunner commented Jan 14, 2025

Is there an existing issue for this?

Kong version ($ kong version)

Current Behavior

Expected Behavior

Steps To Reproduce

Anything else?

ProBrian commented Jan 17, 2025

brunomiguelsantos commented Jan 20, 2025

Kong version (`$ kong version`)