Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-mysql: Fix primary-key update logic when applied to only-changes bindings #2201

Merged
merged 3 commits into from
Dec 13, 2024

Conversation

willdonnelly
Copy link
Member

@willdonnelly willdonnelly commented Dec 12, 2024

Description:

We recently (in #2185) added some logic to source-mysql which detects row update events which modify the collection key of the row. Or more pedantically, it detects row updates which modify the value of "whatever columns we consider the row key to be made of", which is almost always the collection key.

(We play a little fast-and-loose with the concepts at times and it works out because in practice nobody ever sets the collection key to something other than the source DB primary key or tries to tells us to backfill using some other key distinct from the collection key. In fact we should probably deprecate the ability to have a backfill key distinct from the collection key since it's an undocumented backwards-compatibility feature from years ago and I bet nobody is actually using it).

This distinction matters because in the past bindings with the "Only Changes" backfill mode haven't needed to care what the backfill key / scan key / collection key / whatever-you-want-to-call-it is set to, and so we haven't even initialized that state property for such bindings. They go straight from pending to active and don't set KeyColumns along the way.

But now, we need to detect update events which modify the row's key, even if they're on an "Only Changes" binding. This means that either we need to add entirely new plumbing into the replication logic for "what properties correspond to the collection key", or we need to just accept that this is already what KeyColumns holds and therefore it's a bug that we're not setting it on only-changes bindings.

I have opted to go with the latter solution because it's simpler and feels more elegant anyway.

Workflow steps:

No user action is required, and the new primary-key update logic in MySQL should now work reliably for only-changes bindings going forward.

Notes for reviewers:

This PR is pretty small but if you find it at all confusing it might be clearer when viewed commit-by-commit. It has three parts:

  1. A test case demonstrating the issue and eventual fixed behavior.
  2. A small refactoring of the activatePendingStreams so that any future only-changes bindings will have a correctly initialized KeyColumns property.
  3. A bit of temporary migration logic in reconcileStateWithBindings which identifies preexisting only-changes bindings which are already active with a nil KeyColumns and fills in the appropriate value. This can be removed in the near future after it's had a chance to run everywhere that matters.

This change is Reviewable

Apparently the new "primary key update" logic doesn't quite work
right when combined with an `"Only Changes"` backfill mode on a
binding because we don't initialize the `KeyColumns` state. This
new test and snapshot demonstrate the issue.
Modifies the `activatePendingStreams()` logic slightly so that
the `state.KeyColumns` property is set for bindings in the
`"Only Changes"` mode. The only time it should be nil for
an active table is when that table has no primary key.
This commit adds a temporary bit of migration logic which will
fill in `state.KeyColumns` for any already-active only-changes
bindings with a nil value for that state property.

This means that preexisting bindings will behave the same as
any new ones initialized after the corresponding change to the
activatePendingStreams logic.

Since this will immediately add the missing property as soon
as a task restarts and it will persist thereafter, the migration
can probably be removed in just a few days at most.
@willdonnelly willdonnelly added the change:unplanned Unplanned change, useful for things like doc updates label Dec 12, 2024
@willdonnelly willdonnelly requested a review from a team December 12, 2024 23:23
Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@willdonnelly willdonnelly merged commit 027f511 into main Dec 13, 2024
52 of 53 checks passed
@willdonnelly willdonnelly deleted the wgd/mysql-primary-key-updates-20241112 branch December 13, 2024 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change:unplanned Unplanned change, useful for things like doc updates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants