Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

materialize-databricks: debugging logs for duplicates #2259

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mdibaiee
Copy link
Member

@mdibaiee mdibaiee commented Jan 9, 2025

Description:

(Describe the high level scope of new or changed features)

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)


This change is Reviewable

Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, in terms of a debug-only build that we won't actually be merging. I don't see any reason to think that this would cause any kind of data corruption. It should only be a performance hit from running the duplicate-finder queries.

Max: converted[0],
}
} else {
switch v := converted[0].(type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here for what I believe are the full set of types that keys can take. They might be encoded uint64, or possibly float64 (not 100% sure on that) if the otherwise-integer values are really big.


}

var binding = d.bindingForStateKey(stateKey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might run this both before and after the Store query is run, that way we can tell if there were already duplicates in the table before our query ran.

@mdibaiee mdibaiee force-pushed the mahdi/databricks-debug branch from 5826c64 to b04dbcd Compare January 9, 2025 11:36
@mdibaiee mdibaiee force-pushed the mahdi/databricks-debug branch from b04dbcd to 0c30954 Compare January 9, 2025 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants