-
A note for the community
ProblemWe are running Clickhouse and Vector in Kubernetes. Vector is receiving events from Kafka and is sinking those to Clickhouse. When updating Clickhouse, thereby rolling out a new Pod, data is incoming again without dropped data, however Vector is complaining with WARN messages. This doesn't go away on its own. A 503 seems to me, like Vector is trying to reach the old Clickhouse instances? Could this be possible? Failed requests are retried and written successfully. Requests seem to time out after 10 or 30s, as can be seen in the istio-proxy logs below. A restart of Vector helps, but this can't be the right solution. 2025-01-03T07:48:00.264670Z WARN sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=15212892}: vector::sinks::util::retries: Internal log [Retrying after response.] has been suppressed 22 times.
2025-01-03T07:48:00.264683Z WARN sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=15212892}: vector::sinks::util::retries: Retrying after response. reason=503 Service Unavailable: upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection timeout internal_log_rate_limit=true
2025-01-03T07:48:00.264715Z WARN sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=15212891}: vector::sinks::util::retries: Internal log [Retrying after response.] is being suppressed to avoid flooding.
2025-01-03T07:48:11.196311Z WARN sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=15213093}: vector::sinks::util::retries: Internal log [Retrying after response.] has been suppressed 24 times.
2025-01-03T07:48:11.196327Z WARN sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=15213093}: vector::sinks::util::retries: Retrying after response. reason=503 Service Unavailable: upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection timeout internal_log_rate_limit=true
2025-01-03T07:48:11.196372Z WARN sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=15213091}: vector::sinks::util::retries: Internal log [Retrying after response.] is being suppressed to avoid flooding.
2025-01-03T07:48:22.164840Z WARN sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=15213341}: vector::sinks::util::retries: Internal log [Retrying after response.] has been suppressed 29 times.
### Configuration
```text
clickhouse_sink:
type: clickhouse
compression: zstd
healthcheck:
enabled: true
acknowledgements:
enabled: true
inputs:
- clickhouse_tables
endpoint: http://service-clickhouse-prod.clickhouse-prod.svc.cluster.local:8123
format: json_each_row
database: ep_sdc_edge
table: '{{ "{{kafkaSrcTopicToDbName}}" }}'
skip_unknown_fields: true
auth:
strategy: basic
user: ${DEST_CLICKHOUSE_USERNAME}
password: ${DEST_CLICKHOUSE_PASSWORD}
date_time_best_effort: true
encoding:
timestamp_format: rfc3339
batch:
max_events: 2500
timeout_secs: 2 Version0.42.0-distroless-libc Debug OutputDebug Logs of one request, that fails after rolling the Clickhouse Pods k logs vector pod -f | grep "34123"
2025-01-03T09:53:57.063453Z DEBUG sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}:http: vector::internal_events::http_client: Sending HTTP request. uri=http://clickhouse-clickhouse.clickhouse.svc.cluster.local:8123/?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22ep_sdc_edge%22.%22some_table%22+FORMAT+JSONEachRow method=POST version=HTTP/1.1 headers={"content-type": "application/x-ndjson", "content-length": "804", "content-encoding": "zstd", "authorization": Sensitive, "accept-encoding": "zstd,gzip,deflate,br", "user-agent": "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)"} body=[804 bytes]
2025-01-03T09:53:57.063469Z DEBUG sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}:http: hyper::client::pool: reuse idle connection for ("http", clickhouse-clickhouse.clickhouse.svc.cluster.local:8123)
2025-01-03T09:54:14.385982Z DEBUG sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}:http: hyper::client::pool: pooling idle connection for ("http", clickhouse-clickhouse.clickhouse.svc.cluster.local:8123)
2025-01-03T09:54:14.386006Z DEBUG sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}:http: vector::internal_events::http_client: HTTP response. status=503 Service Unavailable version=HTTP/1.1 headers={"content-length": "114", "content-type": "text/plain", "date": "Fri, 03 Jan 2025 09:54:13 GMT", "server": "envoy"} body=[114 bytes]
2025-01-03T09:54:14.386056Z WARN sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}: vector::sinks::util::retries: Internal log [Retrying after response.] has been suppressed 44 times.
2025-01-03T09:54:14.386062Z WARN sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}: vector::sinks::util::retries: Retrying after response. reason=503 Service Unavailable: upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection timeout internal_log_rate_limit=true
2025-01-03T09:54:14.386067Z DEBUG sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}: vector::sinks::util::retries: Retrying request. delay_ms=720
2025-01-03T09:54:15.107150Z DEBUG sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}:http: vector::internal_events::http_client: Sending HTTP request. uri=http://clickhouse-clickhouse.clickhouse.svc.cluster.local:8123/?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22ep_sdc_edge%22.%22some_table%22+FORMAT+JSONEachRow method=POST version=HTTP/1.1 headers={"content-type": "application/x-ndjson", "content-length": "804", "content-encoding": "zstd", "authorization": Sensitive, "accept-encoding": "zstd,gzip,deflate,br", "user-agent": "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)"} body=[804 bytes]
2025-01-03T09:54:15.107195Z DEBUG sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}:http: hyper::client::pool: reuse idle connection for ("http", clickhouse-clickhouse.clickhouse.svc.cluster.local:8123)
2025-01-03T09:54:15.123728Z DEBUG sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}:http: hyper::client::pool: pooling idle connection for ("http", clickhouse-clickhouse.clickhouse.svc.cluster.local:8123)
2025-01-03T09:54:15.123748Z DEBUG sink{component_kind="sink" component_id=clickhouse_sink component_type=clickhouse}:request{request_id=34123}:http: vector::internal_events::http_client: HTTP response. status=200 OK version=HTTP/1.1 headers={"date": "Fri, 03 Jan 2025 09:54:15 GMT", "content-type": "text/plain; charset=UTF-8", "x-clickhouse-server-display-name": "chi-clickhouse-clickhouse-pv-0-2-0.chi-clickhouse-clickhouse-pv-0-2.clickhouse.svc.cluster.local", "x-clickhouse-query-id": "c143d4bf-f960-4197-aab8-5a78971f614d", "x-clickhouse-timezone": "UTC", "x-clickhouse-summary": "{\"read_rows\":\"2\",\"read_bytes\":\"537\",\"written_rows\":\"2\",\"written_bytes\":\"537\",\"total_rows_to_read\":\"0\",\"result_rows\":\"2\",\"result_bytes\":\"537\",\"elapsed_ns\":\"14808528\"}", "x-envoy-upstream-service-time": "15", "server": "envoy", "transfer-encoding": "chunked"} body=[unknown] istio-proxy logs outgoing from vector pod: [2025-01-03T12:05:15.541Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 892 114 10004 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "3a1172a2-b1ea-43c2-a80f-110995853121" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.111.67.8:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.111.67.8:8123 100.96.105.200:52044 - default
[2025-01-03T12:04:59.054Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 857 114 30075 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "b0fa3e20-29b9-4f6d-a684-66f14c2a67b0" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.102.120.112:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.102.120.112:8123 100.96.105.200:33804 - default
[2025-01-03T12:05:23.567Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 7895 114 10465 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "4ce19de7-c1b0-49d0-adfe-6989b348582a" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.111.67.8:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.111.67.8:8123 100.96.105.200:38118 - default
[2025-01-03T12:05:24.031Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 816 114 10001 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "08e6db41-1ff7-410e-bd84-afd074aea060" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.111.67.8:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.111.67.8:8123 100.96.105.200:39238 - default
[2025-01-03T12:05:26.056Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 853 114 10000 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "90a611f4-26f3-404b-a36e-9a6b49841b5b" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.111.67.8:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.111.67.8:8123 100.96.105.200:52044 - default
[2025-01-03T12:05:34.045Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 875 114 10031 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "24d96855-d8fa-4b2e-9aa9-594e8e304da3" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.111.67.8:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.111.67.8:8123 100.96.105.200:38118 - default
[2025-01-03T12:05:34.045Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 872 114 10031 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "b268b8b4-418b-4635-b4c7-e83d7dfa0021" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.111.67.8:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.111.67.8:8123 100.96.105.200:39238 - default
[2025-01-03T12:05:36.062Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 825 114 10006 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "0f92425f-2d25-4bd5-9519-6f798acea71e" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.111.67.8:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.111.67.8:8123 100.96.105.200:52044 - default
[2025-01-03T12:05:22.482Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 809 114 30083 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "8ef3b254-e2b9-43c8-9b0b-9dbeae769cce" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.105.236.38:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.105.236.38:8123 100.96.105.200:60324 - default
[2025-01-03T12:05:44.486Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 866 114 10002 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "5b4a6614-7298-43b3-aa82-b04110734ee6" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.111.67.8:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.111.67.8:8123 100.96.105.200:38118 - default
[2025-01-03T12:05:44.370Z] "POST /?input_format_import_nested_json=1&input_format_skip_unknown_fields=1&date_time_input_format=best_effort&query=INSERT+INTO+%22SOME_TABLE%22+FORMAT+JSONEachRow HTTP/1.1" 503 URX,UF upstream_reset_before_response_started{connection_timeout} - "-" 8295 114 10119 - "-" "Vector/0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)" "7bd0441c-6898-4d25-8788-e0765589b115" "clickhouse-clickhouse.clickhouse.svc.cluster.local:8123" "100.111.67.8:8123" outbound|8123||clickhouse-clickhouse.clickhouse.svc.cluster.local - 100.111.67.8:8123 100.96.105.200:39238 - default This doesn't seem to be a DNS issue, IPs, are correct: ;clickhouse-clickhouse.clickhouse.svc.cluster.local. IN A
;; ANSWER SECTION:
clickhouse-clickhouse.clickhouse.svc.cluster.local. 30 IN A 100.102.120.105
clickhouse-clickhouse.clickhouse.svc.cluster.local. 30 IN A 100.105.236.61
clickhouse-clickhouse.clickhouse.svc.cluster.local. 30 IN A 100.111.67.13 k get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
chi-clickhouse-clickhouse-pv-0-0-0 2/2 Running 0 154m 100.102.120.105 i-094734b915a844584 <none> <none>
chi-clickhouse-clickhouse-pv-0-1-0 2/2 Running 0 152m 100.111.67.13 i-08e176d047c1fefdc <none> <none>
chi-clickhouse-clickhouse-pv-0-2-0 2/2 Running 0 150m 100.105.236.61 i-02cbff00ea6dc3a56 <none> <none> Example DataNo response Additional ContextNo response ReferencesNo response |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
Alright, after debugging and updating this post I seem to realise that this might be an Istio problem, as istio-proxy stills holds the old endpoints. Not going to delete this post as this might help somebody else |
Beta Was this translation helpful? Give feedback.
-
Okay, this seems to be due to the fact that Hyper in Clickhouse Sink is doing connection pooling and envoy in Istio is as well. Is there any way to deactivate connection pooling in the Clickhouse Sink and let this be handled by Istio? See discussion here istio/istio#54539 |
Beta Was this translation helpful? Give feedback.
Alright, after debugging and updating this post I seem to realise that this might be an Istio problem, as istio-proxy stills holds the old endpoints. Not going to delete this post as this might help somebody else