Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBZ-8594 Prevent data loss when primary key update is last operation in transaction #224

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

twthorn
Copy link
Contributor

@twthorn twthorn commented Jan 21, 2025

We found an edge case described in DBZ-8594

There's another possible way to implement this: Modify VitessChangeRecordEmitter - we could modify our implementation to set the offset before emitting the final delete record here. This would be possible but it seemed like a more custom solution, we'd prefer to be as standard as possible. Let me know if you think this is preferred. I think VitessOffsetContext would be called there to resetVgtid (and maybe in create/delete/update functions as well). The thing I don't like is we don't get the bug fixes or changes from the class we inherit from so we have to maintain more custom/overridden logic.

The main downside of this approach is seen in testCopyTableAndRestart ie if we do a special gtid (for snapshot or to speed up to current) then after that no other operations happen (no new vgtid's) then the restart vgtid is still the special one (snapshot or fastforward to current) so that operation would get repeated. Perhaps the gudiance we can give users is to enable transaction metadata (so begin/commit events are always sent even if tables are unaffected), but even that is not a guarantee.

Couple examples of other connectors I think may be at risk
Postgres
SQL Server

@twthorn
Copy link
Contributor Author

twthorn commented Jan 21, 2025

@jpechane Can you give this is a look when you get the chance? Thanks!

Copy link
Contributor

@HenryCaiHaiying HenryCaiHaiying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that possible to use Kafka transaction to solve the problem?

If DELETE, Tombstone, CREATE are 3 events belong to the same Vitess Transaction, should we put the three message publishing into Kafka in the same Kafka Transaction? The downstream consumer won't see the messages until the whole transaction commits?

@twthorn
Copy link
Contributor Author

twthorn commented Jan 22, 2025

@HenryCaiHaiying There are a few reasons to avoid transactions

  • Tight coupling - Debezium embedded engine mode allows Debezium to be run in a standard java application without kafka/kafka connect. We do not want to make any assumptions about what other external systems will be associated with Debezium. We simply want each record to contain correct content (ie the offset refers to the last fully processed transaction)
  • Performance - enabling transactions has some throughput hit
  • Rigidity - all deploys must have transactions enabled to operate correctly

if (message.isTransactionalMessage()) {
// Tx BEGIN/END event
offsetContext.rotateVgtid(newVgtid, message.getCommitTime());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the old code, rotateVgtid() is called on all cases of isTransactionMessage() (including BEGIN and COMMIT and other unknown types), do we need to cover that unknown types in the new code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For unknown types, these are the only two places we create the instance, with either begin/commit, so there are no other expected types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be safer to add an else {assert here} in case our assumptions are not correct.

@@ -154,10 +163,6 @@ else if (message.getOperation().equals(ReplicationMessage.Operation.HEARTBEAT))

offsetContext.event(tableId, message.getCommitTime());
offsetContext.setShard(message.getShard());
if (isLastRowOfTransaction) {
// Right before processing the last row, reset the previous offset to the new vgtid so the last row has the new vgtid as offset.
offsetContext.resetVgtid(newVgtid, message.getCommitTime());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we lost this resetVgid() in the new code, I guess you are assuming the COMMIT message will follow later, does COMMIT message always happens? Also from the old commit, it seems the last row has the new vgtid, looks like this won't be the same in the new code, any implications on this behavior change?

Copy link
Contributor Author

@twthorn twthorn Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does COMMIT message always happens?

VitessReplicationConnection ensures that we always receive a COMMIT after a BEGIN (otherwise throws an error). There is one vstream copy edge case where duplicate BEGINs can be received, but it discards the events so this code doesn't need to handle those.

Also from the old commit, it seems the last row has the new vgtid, looks like this won't be the same in the new code, any implications on this behavior change?

Yes, this is the key part that leads to the data loss bug.

Let the current transaction VGTID be n, previous is n-1, next transaction is n+1

Previous behavior:

  1. BEGIN - rotateVgtid - set currentVgtid to n, restartVgtid is n-1
  2. INSERT/UPDATE/DELETE not last operation - no-op on offset context
  3. INSERT/UPDATE/DELETE that is the last operation - reset vgtid - sets the currentVgtid to be equal to the n. Set restartVgtid equal to n (wrongfully, since it may still have multiple messages to send in case of primary key update (delete, tombstone, create), and can fail part way between them).
  4. COMMIT - rotateVgtid - only does anything if the newVgtid does not equal the currentVgtid, but since the newVgtid is still n and the currentVgtid has already been set to n, this is a no-op and does nothing.

New behavior:

  1. BEGIN - rotateVgtid - set currentVgtid to n, restartVgtid is the n-1
  2. INSERT/UPDATE/DELETE any order, last or not - no op on offset context
  3. COMMIT - resetVgtid - set current & restart VGTIDs to n - the offset will only be committed if the commit event is successfully produced (if tx metadata is enabled) or when it receives a write for the n+1 (since the offset vgtid ie the restartVgtid will point to n) or if it sends a heartbeat event (in the case no tx metadata, and no subsequent writes, that heartbeat will have offset / restartVgtid set to n)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants