Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt incremental _configuration_changes produces false positive results #955

Open
vvsivaprasadreddy opened this issue Mar 5, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@vvsivaprasadreddy
Copy link

Describe the bug

the diff checker for existing and new config has a bug in databricks incremental materialization which is not evaluating correctly, so it detects config changes on every run and trying to apply tags and tblproperties,
link to code

Steps To Reproduce

Define tags and table properties for a incremental model config. Upon running the model multiple times it is trying to apply tags and table properties even though there is no change in the config.

Expected behavior

The get_diff function in TagsConfig class should check only difference in tags and tblproperties and if only there is a difference then it should apply them.

Screenshots and log output

Sample log of config which shows the changes are present even though the existing config and model config has same tags and tblproperties

Existing config:

config={'tags': TagsConfig(set_tags={'reporting': '', 'dbt': ''}, unset_tags=[]), 'tblproperties': TblPropertiesConfig(tblproperties={'clusteringColumns': '[["employee_id"]]', 'delta.checkpoint.writeStatsAsJson': 'false', 'delta.checkpoint.writeStatsAsStruct': 'true', 'delta.enableDeletionVectors': 'true', 'delta.enableRowTracking': 'true', 'delta.feature.clustering': 'supported', 'delta.feature.deletionVectors': 'supported', 'delta.feature.domainMetadata': 'supported', 'delta.feature.invariants': 'supported', 'delta.feature.rowTracking': 'supported', 'delta.feature.timestampNtz': 'supported', 'delta.minReaderVersion': '3', 'delta.minWriterVersion': '7', 'delta.rowTracking.materializedRowCommitVersionColumnName': '_row-commit-version-col-53fe29f4-05f5-4ae2-8bb0-f07437b6dff1', 'delta.rowTracking.materializedRowIdColumnName': '_row-id-col-eba33e5b-0bc2-4f98-ba1c-dca95cce5c6d'}, pipeline_id=None, ignore_list=['pipelines.pipelineId', 'delta.enableChangeDataFeed', 'delta.minReaderVersion', 'delta.minWriterVersion', 'pipeline_internal.catalogType', 'pipelines.metastore.tableName', 'pipeline_internal.enzymeMode', 'clusteringColumns', 'delta.enableRowTracking', 'delta.feature.appendOnly', 'delta.feature.changeDataFeed', 'delta.feature.checkConstraints', 'delta.feature.domainMetadata', 'delta.feature.generatedColumns', 'delta.feature.invariants', 'delta.feature.rowTracking', 'delta.rowTracking.materializedRowCommitVersionColumnName', 'delta.rowTracking.materializedRowIdColumnName', 'spark.internal.pipelines.top_level_entry.user_specified_name'])}

Model config:

config={'tags': TagsConfig(set_tags={'reporting': '', 'dbt': ''}, unset_tags=[]), 'tblproperties': TblPropertiesConfig(tblproperties={'delta.feature.timestampNtz': 'supported'}, pipeline_id=None, ignore_list=['pipelines.pipelineId', 'delta.enableChangeDataFeed', 'delta.minReaderVersion', 'delta.minWriterVersion', 'pipeline_internal.catalogType', 'pipelines.metastore.tableName', 'pipeline_internal.enzymeMode', 'clusteringColumns', 'delta.enableRowTracking', 'delta.feature.appendOnly', 'delta.feature.changeDataFeed', 'delta.feature.checkConstraints', 'delta.feature.domainMetadata', 'delta.feature.generatedColumns', 'delta.feature.invariants', 'delta.feature.rowTracking', 'delta.rowTracking.materializedRowCommitVersionColumnName', 'delta.rowTracking.materializedRowIdColumnName', 'spark.internal.pipelines.top_level_entry.user_specified_name'])}

Model config changes:

changes={'tags': TagsConfig(set_tags={'reporting': '', 'dbt': ''}, unset_tags=[]), 'tblproperties': TblPropertiesConfig(tblproperties={'delta.feature.timestampNtz': 'supported'}, pipeline_id=None, ignore_list=['pipelines.pipelineId', 'delta.enableChangeDataFeed', 'delta.minReaderVersion', 'delta.minWriterVersion', 'pipeline_internal.catalogType', 'pipelines.metastore.tableName', 'pipeline_internal.enzymeMode', 'clusteringColumns', 'delta.enableRowTracking', 'delta.feature.appendOnly', 'delta.feature.changeDataFeed', 'delta.feature.checkConstraints', 'delta.feature.domainMetadata', 'delta.feature.generatedColumns', 'delta.feature.invariants', 'delta.feature.rowTracking', 'delta.rowTracking.materializedRowCommitVersionColumnName', 'delta.rowTracking.materializedRowIdColumnName', 'spark.internal.pipelines.top_level_entry.user_specified_name'])} requires_full_refresh=False

System information

Core:

  • installed: 1.9.2
  • latest: 1.9.2 - Up to date!

Plugins:

  • spark: 1.9.1 - Up to date!
  • databricks: 1.9.4 - Update available!

Windows 11 (Docker dev container in VS code)

Python 3.10.16

@vvsivaprasadreddy vvsivaprasadreddy added the bug Something isn't working label Mar 5, 2025
@benc-db
Copy link
Collaborator

benc-db commented Mar 5, 2025

Thanks for reporting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants