-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip lightweight delete when there is no data to delete #370
base: main
Are you sure you want to change the base?
Conversation
Hi again Thank you for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be best to put this feature behind a flag/setting because it adds a few more queries on top of the regular implementation. Additionally, there are cases where customers know (by definition or other means) that their data doesn't have empty increments, so disabling these unnecessary queries might be beneficial for them.
select count(distinct {{ unique_key }}) from {{ inserting_relation }} | ||
{% endset %} | ||
{% set unique_key_count = run_query(unique_key_query).rows[0].values()[0] %} | ||
{% if unique_key_count == 1 %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure I fully understood, we fetch the actual keys here, because when you put it as a sup query, you get the issue mentioned in ClickHouse/ClickHouse#69559 ?
_, log = run_dbt_and_capture(["run", "--select", model, "--log-level", "debug"]) | ||
result = project.run_sql(f"select count(*) as num_rows from {model}", fetch="one") | ||
assert delete_filter_log in log | ||
assert result[0] == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test failed. lw_delete_unique_key_compilation
and lw_delete_composite_unique_key_compilation
have 2 rows each.
Summary
ClickHouse's lightweight delete function, which utilizes the mutation
ALTER TABLE table UPDATE _row_exists = 0 WHERE condition
, indiscriminately mutates all data parts of a given table. This occurs regardless of whether the data parts contain rows that need to be deleted or not, as detailed in this issue.Furthermore, when lightweight delete operations and regular merges are performed on the same table concurrently, the lightweight delete operation is forced to wait for the merge to complete before it can proceed. This leads to unnecessary delays when there are no rows to delete, rendering the
DELETE FROM...
operation wasteful.Therefore, I propose implementing a preliminary check to determine if data in the table meets the deletion criteria before initiating a lightweight delete. If no matching data is found, the delete operation should be skipped, optimizing resource usage and improving overall efficiency.
Checklist
Delete items not relevant to your PR: