Optimal Format for ReplacingMergeTree Inserts #2058

woodhull · 2024-09-08T19:50:10Z

I'm sharing this to share our use case and see if there might be a way to evolve the PeerDB sync to help.

We're using a custom table definition with PeerDB.

CREATE TABLE nashville_staging.people
(
    `id` UInt64,
    `account_id` UInt32,
    `organization_id` UInt64,
    `first_name` String,
    `last_name` String,
    `properties` String,
    `created_at` DateTime64(6),
    `updated_at` DateTime64(6),
    `assigned_id` UInt32,
    `tsvector` String,
    `last_conversation_id` UInt64,
    `_peerdb_synced_at` DateTime64(9) DEFAULT now(),
    `_peerdb_is_deleted` UInt8,
    `_peerdb_version` UInt64
)
ENGINE = SharedReplacingMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}', _peerdb_version, _peerdb_is_deleted)
PRIMARY KEY account_id
ORDER BY (account_id, id)
SETTINGS index_granularity = 8192

All of our queries filter on account_id, so I wanted to use it as the Primary Key.

The clickhouse PRIMARY KEY must prefix the ORDER BY.

This works well with the sync of new rows and updates (since the account_id for rows in this table does not change), but falls apart for deletes.

The delete row that is inserted by PeerDB contains a 0 for the account_id rather than the actual value, so it does not match the removal criteria for the SharedReplacingMergeTree.

I think this means we're effectively stuck only using id as the primary key in these tables.

I'm not sure if this is possible architecturally but is it possible to pass the full row values, or perhaps a set of specified column values on CDC delete through to clickhouse? This would allow more complex / custom table definitions to be sync'd while still supporting deletes.

Alternatively, this might have all been a bad idea and I should have stuck with an id Primary Key matched with Order By like the peerdb default -- I'm just trying to make sure that the tables are well modeled for our anticipated queries.

This is more of an architecture / design question than the typical Github Issue. Sharing it here in this format in case it's useful to others.

The text was updated successfully, but these errors were encountered:

woodhull · 2024-09-08T19:52:12Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimal Format for ReplacingMergeTree Inserts #2058

Optimal Format for ReplacingMergeTree Inserts #2058

woodhull commented Sep 8, 2024

woodhull commented Sep 8, 2024

Optimal Format for ReplacingMergeTree Inserts #2058

Optimal Format for ReplacingMergeTree Inserts #2058

Comments

woodhull commented Sep 8, 2024

woodhull commented Sep 8, 2024