Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Multiple Actions in MERGE and Column-Specific Updates #949

Open
liangonli opened this issue Mar 1, 2025 · 0 comments
Open
Labels
enhancement New feature or request

Comments

@liangonli
Copy link

liangonli commented Mar 1, 2025

Describe the feature

Currently, in dbt-databricks v1.9.7, the incremental merge strategy does not fully leverage the flexibility of Databricks' MERGE INTO statement.

According to the Databricks documentation, it is possible to:

  1. Define multiple MATCHED conditions with different actions (e.g., UPDATE, DELETE), each with its own condition.
  2. Define multiple NOT MATCHED BY SOURCE conditions with different actions (e.g., DELETE, UPDATE).
  3. Specify which columns to update in an UPDATE action instead of updating all columns by default.

However, in dbt-databricks v1.9.7, these capabilities are not fully supported:

It is not possible to specify multiple MATCHED or NOT MATCHED BY SOURCE actions.
The action for MATCHED is not configurable.
We cannot specify which columns should be updated in an UPDATE action.

Describe alternatives you've considered

Manually implementing a MERGE INTO statement using a custom post-hook, but this reduces the benefits of dbt’s incremental logic and maintainability.
Using a two-step process: updating/deleting records separately by 2 merging, but this introduces additional complexity and performance overhead.

Additional context

Databricks supports defining multiple conditions within MERGE INTO, making it more flexible for incremental processing. Supporting these features in dbt-databricks would allow users to take full advantage of this flexibility and optimize their data pipelines.

Who will this benefit?

Data engineers & analysts using incremental models in dbt-databricks.
Use cases where for example:

  1. Different update conditions need to be applied to different rows.
  2. Certain rows should be deleted instead of updated based on specific criteria.
  3. Only specific columns should be updated to prevent unnecessary writes.

Are you interested in contributing this feature?

The configurations could be like this:
matched_delete_condition
matched_update_condition
not_matched_by_source_delete_condition
not_matched_by_source_update_condition

@liangonli liangonli added the enhancement New feature or request label Mar 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant