Skip to content

Commit

Permalink
Merge pull request #2969 from Blargian/add_CollapsingMergeTree_to_upd…
Browse files Browse the repository at this point in the history
…ates_page

Add `CollapsingMergeTree` to updates page
  • Loading branch information
Blargian authored Jan 10, 2025
2 parents 65dc79a + 1888c21 commit 0fd529a
Showing 1 changed file with 61 additions and 9 deletions.
70 changes: 61 additions & 9 deletions docs/en/managing-data/updates.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ description: How to update data in ClickHouse
keywords: [update, updating data]
---

import CloudAvailableBadge from '@theme/badges/CloudAvailableBadge';

## Differences between updating data in ClickHouse and OLTP databases

When it comes to handling updates, ClickHouse and OLTP databases diverge significantly due to their underlying design philosophies and target use cases. For example, PostgreSQL, a row-oriented, ACID-compliant relational database, supports robust and transactional update and delete operations, ensuring data consistency and integrity through mechanisms like Multi-Version Concurrency Control (MVCC). This allows for safe and reliable modifications even in high-concurrency environments.
Expand All @@ -19,17 +21,18 @@ For both operations, if the number of submitted mutations constantly exceeds the

In summary, update operations should be issued carefully, and the mutations queue should be tracked closely using the `system.mutations` table. Do not issue updates frequently as you would in OLTP databases. If you have a requirement for frequent updates, see [ReplacingMergeTree](/en/engines/table-engines/mergetree-family/replacingmergetree).

| Method | Syntax | When to use |
| --- | --- | --- |
| [Update mutation](/en/sql-reference/statements/alter/update) | `ALTER TABLE [table] UPDATE` | Use when data must be updated to disk immediately (e.g. for compliance). Negatively affects `SELECT` performance. |
| [Lightweight update](/en/guides/developer/lightweight-update) | `ALTER TABLE [table] UPDATE` | Enable using `SET apply_mutations_on_fly = 1;`. Use when updating small amounts of data. Rows are immediately returned with updated data in all subsequent `SELECT` queries but are initially only internally marked as updated on disk. |
| [ReplacingMergeTree](/en/engines/table-engines/mergetree-family/replacingmergetree) | `ENGINE = ReplacingMergeTree` | Use when updating large amounts of data. This table engine is optimized for data deduplication on merges. |
| Method | Syntax | When to use |
|---------------------------------------------------------------------------------------|--------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Update mutation](/en/sql-reference/statements/alter/update) | `ALTER TABLE [table] UPDATE` | Use when data must be updated to disk immediately (e.g. for compliance). Negatively affects `SELECT` performance. |
| [Lightweight update](/en/guides/developer/lightweight-update) (ClickHouse Cloud) | `ALTER TABLE [table] UPDATE` | Enable using `SET apply_mutations_on_fly = 1;`. Use when updating small amounts of data. Rows are immediately returned with updated data in all subsequent `SELECT` queries but are initially only internally marked as updated on disk. |
| [ReplacingMergeTree](/en/engines/table-engines/mergetree-family/replacingmergetree) | `ENGINE = ReplacingMergeTree` | Use when updating large amounts of data. This table engine is optimized for data deduplication on merges. |
| [CollapsingMergeTree](/en/engines/table-engines/mergetree-family/collapsingmergetree) | `ENGINE = CollapsingMergeTree(Sign)` | Use when updating individual rows frequently, or for scenarios where you need to maintain the latest state of objects that change over time. For example, tracking user activity or article stats. |

Here is a summary of the different ways to update data in ClickHouse:

## Update Mutations

Update mutations - can be issued through a `ALTER TABLE … UPDATE` command e.g.
Update mutations can be issued through a `ALTER TABLE … UPDATE` command e.g.

```sql
ALTER TABLE posts_temp
Expand All @@ -39,7 +42,9 @@ These are extremely IO-heavy, rewriting all the parts that match the `WHERE` exp

Read more about [update mutations](/en/sql-reference/statements/alter/update).

## Lightweight Updates (only available in ClickHouse Cloud)
## Lightweight Updates

<CloudAvailableBadge />

Lightweight updates provide a mechanism to update rows such that they are updated immediately, and subsequent `SELECT` queries will automatically return with the changed values (this incurs an overhead and will slow queries). This effectively addresses the atomicity limitation of normal mutations. We show an example below:

Expand All @@ -51,7 +56,7 @@ FROM posts
WHERE Id = 404346

┌─ViewCount─┐
26762
26762
└───────────┘

1 row in set. Elapsed: 0.115 sec. Processed 59.55 million rows, 238.25 MB (517.83 million rows/s., 2.07 GB/s.)
Expand All @@ -66,7 +71,7 @@ FROM posts
WHERE Id = 404346

┌─ViewCount─┐
26763
26763
└───────────┘

1 row in set. Elapsed: 0.149 sec. Processed 59.55 million rows, 259.91 MB (399.99 million rows/s., 1.75 GB/s.)
Expand All @@ -76,6 +81,53 @@ Note that for lightweight updates, a mutation is still used to update the data;

Read more about [lightweight updates](/en/guides/developer/lightweight-update).

## Collapsing Merge Tree

Stemming from the idea that updates are expensive but inserts can be leveraged to perform updates,
the [`CollapsingMergeTree`](/en/engines/table-engines/mergetree-family/collapsingmergetree) table engine
can be used together with a `sign` column as a way to tell ClickHouse to update a specific row by collapsing (deleting)
a pair of rows with sign `1` and `-1`.
If `-1` is inserted for the `sign` column, the whole row will be deleted.
If `1` is inserted for the `sign` column, ClickHouse will keep the row.
Rows to update are identified based on the sorting key used in the `ORDER BY ()` statement when creating the table.

```sql
CREATE TABLE UAct
(
UserID UInt64,
PageViews UInt8,
Duration UInt8,
Sign Int8 -- A special column used with the CollapsingMergeTree table engine
)
ENGINE = CollapsingMergeTree(Sign)
ORDER BY UserID

INSERT INTO UAct VALUES (4324182021466249494, 5, 146, 1)
INSERT INTO UAct VALUES (4324182021466249494, 5, 146, -1) -- sign = -1 signals to update the state of this row
INSERT INTO UAct VALUES (4324182021466249494, 6, 185, 1) -- the row is replaced with the new state

SELECT
UserID,
sum(PageViews * Sign) AS PageViews,
sum(Duration * Sign) AS Duration
FROM UAct
GROUP BY UserID
HAVING sum(Sign) > 0

┌──────────────UserID─┬─PageViews─┬─Duration─┐
43241820214662494946185
└─────────────────────┴───────────┴──────────┘
```

:::note
The approach above for updating requires users to maintain state client side.
While this is most efficient from ClickHouse's perspective, it can be complex to work with at scale.

We recommend reading the documentation
for [`CollapsingMergeTree`](/en/engines/table-engines/mergetree-family/collapsingmergetree)
for a more comprehensive overview.
:::

## More Resources

- [Handling Updates and Deletes in ClickHouse](https://clickhouse.com/blog/handling-updates-and-deletes-in-clickhouse)

0 comments on commit 0fd529a

Please sign in to comment.