Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip lightweight delete when there is no data to delete #370

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Magicbeanbuyer
Copy link

Summary

ClickHouse's lightweight delete function, which utilizes the mutation ALTER TABLE table UPDATE _row_exists = 0 WHERE condition, indiscriminately mutates all data parts of a given table. This occurs regardless of whether the data parts contain rows that need to be deleted or not, as detailed in this issue.

Furthermore, when lightweight delete operations and regular merges are performed on the same table concurrently, the lightweight delete operation is forced to wait for the merge to complete before it can proceed. This leads to unnecessary delays when there are no rows to delete, rendering the DELETE FROM... operation wasteful.

Therefore, I propose implementing a preliminary check to determine if data in the table meets the deletion criteria before initiating a lightweight delete. If no matching data is found, the delete operation should be skipped, optimizing resource usage and improving overall efficiency.

Checklist

Delete items not relevant to your PR:

  • Unit and integration tests covering the common scenarios were added
  • A human-readable description of the changes was provided to include in CHANGELOG

@CLAassistant
Copy link

CLAassistant commented Oct 21, 2024

CLA assistant check
All committers have signed the CLA.

@Magicbeanbuyer Magicbeanbuyer changed the title feat: skip lightweight delete when there is no data to delete skip lightweight delete when there is no data to delete Oct 21, 2024
@Magicbeanbuyer Magicbeanbuyer changed the title skip lightweight delete when there is no data to delete Skip lightweight delete when there is no data to delete Oct 21, 2024
@BentsiLeviav
Copy link
Contributor

Hi again

Thank you for your contribution!
Before reviewing your PR, it is required to add a short description with a link to this PR the the changelog (please keep the current format we have in the Changelog file).

Copy link
Contributor

@BentsiLeviav BentsiLeviav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best to put this feature behind a flag/setting because it adds a few more queries on top of the regular implementation. Additionally, there are cases where customers know (by definition or other means) that their data doesn't have empty increments, so disabling these unnecessary queries might be beneficial for them.

select count(distinct {{ unique_key }}) from {{ inserting_relation }}
{% endset %}
{% set unique_key_count = run_query(unique_key_query).rows[0].values()[0] %}
{% if unique_key_count == 1 %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure I fully understood, we fetch the actual keys here, because when you put it as a sup query, you get the issue mentioned in ClickHouse/ClickHouse#69559 ?

_, log = run_dbt_and_capture(["run", "--select", model, "--log-level", "debug"])
result = project.run_sql(f"select count(*) as num_rows from {model}", fetch="one")
assert delete_filter_log in log
assert result[0] == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test failed. lw_delete_unique_key_compilation and lw_delete_composite_unique_key_compilation have 2 rows each.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants