-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBZ-6721 & DBZ-6722 Add transforms: local vgtid, remove field, and filter transaction topic #194
Conversation
Hi @twthorn, thanks for your contribution. Please prefix the commit message(s) with the DBZ-xxx JIRA issue key. |
@jpechane all updated with refactor to move from core to here, tests are all passing, ready for review |
@twthorn LGTM, thanks. Could you please rebase the PR on the latest main and remove merge commit(s)? When cleaned up the PR is ready to be merged. |
@jpechane thanks for the review, updated! |
@twthorn Applied, thanks |
Could you please provide a short update of vitess connector docs describing the new capability? Thanks |
PR to update docs debezium/debezium#5634 |
DBZ-6721
DBZ-6722
Add three new transforms
Filter Transaction Topic
For connectors where the volume is high (transaction topic is massive), we can drop all messages matching the schema
Remove Field
Generic config for specifying field names (separated by dot '.' for nesting) to remove fields. Can be used to drop any field, in this case we use for transaction.id which can be very large in vitess (contains all shards gtid).
Local VGTID
Update VGTID field to only contain vgtid for the shard that the change happened in.
We add this local vgtid transformation (add a hashmap so we don't have to iterate over all shards to do this).
We need to keep the vgtid global for proper offset/state storage so therefore it must be done only at the end as SMT before publishing the message (and does not affect source offset of the message).
Itests
Also add itests for all these functionalities together for how they can be used to receive transaction order metadata without massively increasing data volume produced.