-
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(blog): add clickhouse-replicatedreplacingmergetree (#495)
- Loading branch information
Showing
4 changed files
with
354 additions
and
2 deletions.
There are no files selected for viewing
344 changes: 344 additions & 0 deletions
344
apps/blog/_posts/2024/06/clickhouse-replicatedreplacingmergetree.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,344 @@ | ||
--- | ||
title: ClickHouse ReplicatedReplacingMergeTree Engine | ||
date: '2024-06-23' | ||
author: Duyet | ||
series: ClickHouse on Kubernetes | ||
category: Data | ||
tags: | ||
- Data | ||
- ClickHouse | ||
- ClickHouse on Kubernetes | ||
slug: /2024/06/clickhouse-replicatedreplacingmergetree.html | ||
thumbnail: /media/2024/06/clickhouse-replicated/clickhouse-replicatedreplacingmergetree.png | ||
description: Now you have a large single node cluster with a ReplacingMergeTree table that can deduplicate itself. This time, you need more replicated nodes to serve more data users or improve the high availability. | ||
--- | ||
|
||
Now you have a large single node cluster with a [ReplacingMergeTree](https://blog.duyet.net/2024/06/clickhouse-replacingmergetree.html) table that can deduplicate itself. This time, you need more replicated nodes to serve more data users or improve the high availability. | ||
|
||
1. [Cluster setup via ClickHouse Operator](#cluster-setup-via-clickhouse-operator) | ||
1. [Create Replicated table](#create-replicated-table) | ||
1. [Data insert](#data-insert) | ||
1. [Manage replication](#manage-replication) | ||
- [How many replication jobs are running?](#how-many-replication-jobs-are-running) | ||
- [The `system.replicated_fetches` also contains the detail of fetching tasks](#the-systemreplicated_fetches-also-contains-the-detail-of-fetching-tasks) | ||
- [clickhouse-monitoring](#clickhouse-monitoring) | ||
1. [Replication Performance Tuning](#replication-performance-tuning) | ||
1. [References](#references) | ||
|
||
|
||
In this replicated setup, table will be synced between `clickhouse-01` and `clickhouse-02` via ClickHouse Keeper (or Zookeeper). | ||
|
||
|
||
![](/media/2024/06/clickhouse-replicated/clickhouse-replicatedreplacingmergetree.png) | ||
|
||
Replication works at the level of an individual table. Note that Data Replication between nodes is not limited to the ReplicatedReplacingMergeTree engine only, this is just a recommended and the most useful table engine for data engineers. Replication can support all engines in the MergeTree family. | ||
|
||
- ReplicatedMergeTree | ||
- ReplicatedSummingMergeTree | ||
- ReplicatedReplacingMergeTree | ||
- ReplicatedAggregatingMergeTree | ||
- ReplicatedCollapsingMergeTree | ||
- ReplicatedVersionedCollapsingMergeTree | ||
- ReplicatedGraphiteMergeTree | ||
|
||
# Cluster setup via ClickHouse Operator | ||
|
||
With the previous [clickhouse-operator installed](https://blog.duyet.net/2024/03/clickhouse-on-kubernetes.html), you will need | ||
1. to have **Zookeeper** or **ClickHouse Keeper installed** on your Kubernetes. | ||
2. then, you can modify the single `ClickHouseInstallation` or add a new cluster by configuring it to **connect to the Keeper cluster**. | ||
|
||
Following the [document by Altinity here](https://github.com/Altinity/clickhouse-operator/blob/master/docs/zookeeper_setup.md) for setting up Zookeeper, OR install the ClickHouse Keeper using my own [Helm Chart](https://github.com/duyet/charts/tree/master/clickhouse-keeper), which I think is more lightweight than the Java-based Zookeeper. | ||
|
||
```bash | ||
$ helm repo add duyet https://duyet.github.io/charts | ||
$ helm install -n clickhouse clickhouse-keeper duyet/clickhouse-keeper | ||
|
||
$ kubectl -n clickhouse get po | ||
clickhouse-keeper-0 | ||
clickhouse-keeper-1 | ||
clickhouse-keeper-2 | ||
``` | ||
|
||
Let create the ClickHouse cluster with 1 shard and 2 replicas and persistent storage as the manifest below: | ||
|
||
```yaml | ||
# File: clickhouse-cluster.yaml | ||
apiVersion: clickhouse.altinity.com/v1 | ||
kind: ClickHouseInstallation | ||
metadata: | ||
name: cluster | ||
spec: | ||
defaults: | ||
templates: | ||
dataVolumeClaimTemplate: default | ||
podTemplate: clickhouse:24.5 | ||
|
||
configuration: | ||
zookeeper: | ||
nodes: | ||
- host: clickhouse-keeper-0.clickhouse-keepers.clickhouse.svc | ||
port: 2181 | ||
- host: clickhouse-keeper-1.clickhouse-keepers.clickhouse.svc | ||
port: 2181 | ||
- host: clickhouse-keeper-2.clickhouse-keepers.clickhouse.svc | ||
port: 2181 | ||
|
||
clusters: | ||
- name: clickhouse | ||
layout: | ||
shardsCount: 1 | ||
replicasCount: 2 | ||
|
||
templates: | ||
volumeClaimTemplates: | ||
- name: default | ||
spec: | ||
accessModes: | ||
- ReadWriteOnce | ||
resources: | ||
requests: | ||
storage: 500Mi | ||
|
||
podTemplates: | ||
- name: clickhouse:24.5 | ||
spec: | ||
containers: | ||
- name: clickhouse-pod | ||
image: clickhouse/clickhouse-server:24.5 | ||
``` | ||
Apply: | ||
```bash | ||
$ kubectl apply -f clickhouse-cluster.yaml | ||
clickhouseinstallation.clickhouse.altinity.com/cluster created | ||
|
||
$ kubectl get clickhouseinstallation -n clickhouse | ||
NAME CLUSTERS HOSTS STATUS HOSTS-COMPLETED AGE | ||
cluster 1 1 Completed 35s | ||
|
||
$ kubectl get svc -n clickhouse | ||
NAME TYPE CLUSTER-IP PORT(S) AGE | ||
clickhouse-cluster LoadBalancer 10.152.183.156 8123:32286/TCP,9000:30767/TCP 3m45s | ||
chi-cluster-clickhouse-0-0 ClusterIP None <none> 9000/TCP,8123/TCP,9009/TCP 3m47s | ||
chi-cluster-clickhouse-1-0 ClusterIP None <none> 9000/TCP,8123/TCP,9009/TCP 3m47s | ||
``` | ||
|
||
Now you can access your cluster by connecting to any of them, such as `chi-cluster-clickhouse-0-0` or `chi-cluster-clickhouse-1-0`. The `clickhouse-cluster` service can be used as a load balancer, which also routes to one of the two replica pods. | ||
|
||
```bash | ||
$ kubectl port-forward svc/clickhouse-cluster 8123 -n clickhouse | ||
``` | ||
|
||
# Create Replicated table | ||
|
||
Create new table using `ReplicatedReplacingMergeTree` engine with using specified macros, the syntax is: | ||
|
||
```sql | ||
CREATE TABLE default.events ON CLUSTER '{cluster}' | ||
( | ||
`event_time` DateTime, | ||
`event_date` Date DEFAULT toDate(event_time), | ||
`user_id` UInt32, | ||
`event_type` String, | ||
`value` String | ||
) | ||
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{shard}/{database}/{table}', '{replica}') | ||
PARTITION BY toYYYYMM(event_date) | ||
ORDER BY (user_id, event_type, event_time) | ||
``` | ||
|
||
Operator provides set of ClickHouse [macros](https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings#macros), which are | ||
|
||
1. `{installation}` -- ClickHouse Installation name | ||
2. `{cluster}` -- primary cluster name | ||
3. `{replica}` -- replica name in the cluster, maps to pod service name | ||
4. `{shard}` -- shard id | ||
|
||
`{database}` and `{table}` is built in macro by ClickHouse. | ||
|
||
`/clickhouse/tables/{shard}/{database}/{table}` is a [Zookeeper path](https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html). ClickHouse use this to keep track DDL change and data sync between replicas. | ||
|
||
`/clickhouse/tables` is a common prefix and ClickHouse recommend using exactly this one. | ||
|
||
The path above will be expanded to | ||
|
||
- `ReplicatedReplacingMergeTree('/clickhouse/tables/0/default/events', 'chi-cluster-clickhouse-0-0')` on the first node | ||
- `ReplicatedReplacingMergeTree('/clickhouse/tables/0/default/events', 'chi-cluster-clickhouse-1-0')` on the second node | ||
|
||
The `ON CLUSTER '{cluster}'` clause will distribute and create the table on every replica. If you don't use this clause, please connect to each node and create the table yourself. Data will be synchronized from another ClickHouse node to the current one if they have the same Zookeeper path. | ||
|
||
# Data Insert | ||
|
||
Now inserting data into the first node: | ||
|
||
```bash | ||
$ kubectl exec chi-cluster-clickhouse-0-0 -- clickhouse client -q "INSERT INTO events (user_id, event_type, event_time, value) VALUES (111, 'click', '2024-07-01 00:00:00', '/home')" | ||
``` | ||
|
||
And check data in the second node: | ||
|
||
```bash | ||
$ kubectl exec chi-cluster-clickhouse-1-0 -- clickhouse client -q "SELECT COUNT() FROM events" | ||
|
||
┌─COUNT()─┐ | ||
│ 1 │ | ||
└─────────┘ | ||
``` | ||
|
||
Or count from all replicas, run this on any node: | ||
|
||
```bash | ||
$ kubectl exec chi-cluster-clickhouse-0-0 -- clickhouse client -q "SELECT hostName(), COUNT() FROM clusterAllReplicas(default.events) GROUP BY 1" | ||
|
||
┌─hostName()─────────────────┬─count()─┐ | ||
│ chi-cluster-clickhouse-0-0 │ 1 │ | ||
│ chi-cluster-clickhouse-1-0 │ 1 │ | ||
└────────────────────────────┴─────────┘ | ||
``` | ||
# Manage replication | ||
|
||
With `Replicated*MergeTree` tables, you may sometimes see that it is slow to determine if they are up to date with another instance, or how to ensure they are already synchronized. ClickHouse provides some system tables to help us monitor that. | ||
|
||
### How many replication jobs are running? | ||
|
||
```sql | ||
SELECT | ||
database, | ||
table, | ||
type, | ||
count() AS count_all, | ||
countIf(is_currently_executing) AS count_executing | ||
FROM system.replication_queue | ||
GROUP BY ALL | ||
ORDER BY count_all DESC | ||
|
||
┌─database──┬─table──────┬─type─────┬─count_all─┬─count_executing─┐ | ||
│ data_lake │ events │ GET_PART │ 9319 │ 0 │ | ||
└───────────┴────────────┴──────────┴───────────┴─────────────────┘ | ||
``` | ||
|
||
`count_all` contains number of jobs in the queue need to be done, and `count_executing` is the current number of jobs are running. | ||
|
||
Total of count_executing will not large than the [`background_fetches_pool_size`](https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings#background_fetches_pool_size) setting. | ||
|
||
```sql | ||
SELECT * | ||
FROM system.metrics | ||
WHERE metric LIKE '%Fetch%' | ||
|
||
|
||
┌─metric────────────────────┬─value─┬─description──────────────────────────────────────────────────────────────┐ | ||
│ ReplicatedFetch │ 32 │ Number of data parts being fetched from replica │ | ||
│ BackgroundFetchesPoolTask │ 16 │ Number of active fetches in an associated background pool │ | ||
│ BackgroundFetchesPoolSize │ 32 │ Limit on number of simultaneous fetches in an associated background pool │ | ||
└───────────────────────────┴───────┴──────────────────────────────────────────────────────────────────────────┘ | ||
``` | ||
|
||
### The `system.replicated_fetches` also contains the detail of fetching tasks | ||
|
||
Number of jobs are running: | ||
|
||
```sql | ||
SELECT COUNT() | ||
FROM system.replicated_fetches | ||
|
||
┌─count()─┐ | ||
│ 16 │ | ||
└─────────┘ | ||
``` | ||
|
||
The replicated fetches detail: | ||
|
||
```sql | ||
SELECT * | ||
FROM system.replicated_fetches | ||
LIMIT 1 | ||
FORMAT Vertical | ||
|
||
Query id: 14373025-d312-47f4-af9e-63877e8eab36 | ||
|
||
Row 1: | ||
────── | ||
database: data_lake | ||
table: events | ||
elapsed: 673.575720277 | ||
progress: 0.6662606659500931 | ||
result_part_name: 20210214_7_7_309 | ||
result_part_path: /var/lib/clickhouse/store/e47/e478ebd3-e831-4b23-8e5c-66e3b5320170/20210214_7_7_309/ | ||
partition_id: 20210214 | ||
total_size_bytes_compressed: 2705400808 | ||
bytes_read_compressed: 1802502144 | ||
source_replica_path: /clickhouse/tables/data_lake/events/0/replicas/chi-cluster-clickhouse-0-0 | ||
to_detached: 0 | ||
thread_id: 670 | ||
``` | ||
|
||
### clickhouse-monitoring | ||
|
||
All these query above can be easily managed via [clickhouse-monitoring](http://github.com/duyet/clickhouse-monitoring) UI tool. Checking [previous post](https://blog.duyet.net/2024/03/clickhouse-monitoring.html#4-clickhouse-monitoring-ui-dashboard) for installation and usage. | ||
|
||
# Replication Performance Tuning | ||
|
||
Parts will be fetched immediately or during hours or days, depending on your table size. If you see the replication is too slow, consider checking the `system.replication_queue` table, which shows the entries with `postpone_reason`. | ||
|
||
```sql | ||
SELECT | ||
type, | ||
postpone_reason | ||
FROM system.replication_queue | ||
WHERE is_currently_executing = 0 | ||
LIMIT 5 | ||
|
||
Query id: 64130c98-9536-4c41-abe3-5f26a27a2ffb | ||
|
||
┌─type─────┬─postpone_reason──────────────────────────────────────────────────────────────────────────────┐ | ||
│ GET_PART │ Not executing fetch of part 20230217_11_11_143 because 16 fetches already executing, max 16. │ | ||
│ GET_PART │ Not executing fetch of part 20230218_41_41_222 because 16 fetches already executing, max 16. │ | ||
│ GET_PART │ Not executing fetch of part 20230221_7_7_81 because 16 fetches already executing, max 16. │ | ||
│ GET_PART │ Not executing fetch of part 20230228_11_11_42 because 16 fetches already executing, max 16. │ | ||
│ GET_PART │ Not executing fetch of part 20230311_3_3_141 because 16 fetches already executing, max 16. │ | ||
└──────────┴──────────────────────────────────────────────────────────────────────────────────────────────┘ | ||
``` | ||
|
||
This is mostly ok because the maximum replication slots just are being used. In case you believe your cluster can handle more than that, consider to increase the pool size [`background_fetches_pool_size`](https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings#background_fetches_pool_size) | ||
|
||
```xml | ||
<yandex> | ||
<background_fetches_pool_size>32</background_fetches_pool_size> | ||
<yandex> | ||
``` | ||
|
||
The `MergeTree` setting [`replicated_max_parallel_fetches_for_host`](https://clickhouse.com/docs/en/operations/settings/merge-tree-settings#replicated_max_parallel_fetches_for_host) also limits that (default: 15) and is not well-documented. We should also consider increasing that. | ||
|
||
```xml | ||
<yandex> | ||
<profile> | ||
<default> | ||
<replicated_max_parallel_fetches_for_host>32</replicated_max_parallel_fetches_for_host> | ||
</default> | ||
</profile> | ||
<yandex> | ||
``` | ||
|
||
|
||
|
||
The [`replicated_fetches_http_connection_timeout`](https://clickhouse.com/docs/en/operations/settings/merge-tree-settings#replicated_fetches_http_connection_timeout) and [`replicated_fetches_http_receive_timeout`](https://clickhouse.com/docs/en/operations/settings/merge-tree-settings#replicated_fetches_http_receive_timeout) sometimes helps if you see a lot of timeout errors in the ClickHouse logs, but wouldn't it be better to reduce the pool size instead. | ||
|
||
```xml | ||
<yandex> | ||
<profile> | ||
<default> | ||
<replicated_fetches_http_connection_timeout>30</replicated_fetches_http_connection_timeout> | ||
<replicated_fetches_http_receive_timeout>180</replicated_fetches_http_receive_timeout> | ||
</default> | ||
</profile> | ||
<yandex> | ||
``` | ||
|
||
# References | ||
- [Setup ClickHouse cluster with data replication](https://github.com/Altinity/clickhouse-operator/blob/master/docs/replication_setup.md) | ||
- [Update ClickHouse Installation - add replication to existing non-replicated cluster](https://github.com/Altinity/clickhouse-operator/blob/master/docs/chi_update_add_replication.md) | ||
- [Converting MergeTree to Replicated](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-converting-mergetree-to-replicated/) | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file added
BIN
+95 KB
...media/2024/06/clickhouse-replicated/clickhouse-replicatedreplacingmergetree.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2d3c80d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Successfully deployed to the following URLs:
cv – ./apps/cv
duyet-cv.vercel.app
cv-git-master-duyet-team.vercel.app
cv.duyet.net
cv-duyet-team.vercel.app
2d3c80d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Successfully deployed to the following URLs:
blog – ./apps/blog
blog-git-master-duyet-team.vercel.app
blog-duyet-team.vercel.app
duyet.vercel.app
blog.duyet.net
2d3c80d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Successfully deployed to the following URLs:
insights – ./apps/insights
duyet-insights.vercel.app
insights-duyet-team.vercel.app
insights-git-master-duyet-team.vercel.app
insights.duyet.net