-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBZ-8594 Prevent data loss when primary key update is last operation in transaction #224
base: main
Are you sure you want to change the base?
Conversation
@jpechane Can you give this is a look when you get the chance? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that possible to use Kafka transaction to solve the problem?
If DELETE, Tombstone, CREATE are 3 events belong to the same Vitess Transaction, should we put the three message publishing into Kafka in the same Kafka Transaction? The downstream consumer won't see the messages until the whole transaction commits?
@HenryCaiHaiying There are a few reasons to avoid transactions
|
if (message.isTransactionalMessage()) { | ||
// Tx BEGIN/END event | ||
offsetContext.rotateVgtid(newVgtid, message.getCommitTime()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the old code, rotateVgtid() is called on all cases of isTransactionMessage() (including BEGIN and COMMIT and other unknown types), do we need to cover that unknown types in the new code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For unknown types, these are the only two places we create the instance, with either begin/commit, so there are no other expected types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be safer to add an else {assert here} in case our assumptions are not correct.
@@ -154,10 +163,6 @@ else if (message.getOperation().equals(ReplicationMessage.Operation.HEARTBEAT)) | |||
|
|||
offsetContext.event(tableId, message.getCommitTime()); | |||
offsetContext.setShard(message.getShard()); | |||
if (isLastRowOfTransaction) { | |||
// Right before processing the last row, reset the previous offset to the new vgtid so the last row has the new vgtid as offset. | |||
offsetContext.resetVgtid(newVgtid, message.getCommitTime()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we lost this resetVgid() in the new code, I guess you are assuming the COMMIT message will follow later, does COMMIT message always happens? Also from the old commit, it seems the last row has the new vgtid, looks like this won't be the same in the new code, any implications on this behavior change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does COMMIT message always happens?
VitessReplicationConnection ensures that we always receive a COMMIT after a BEGIN (otherwise throws an error). There is one vstream copy edge case where duplicate BEGINs can be received, but it discards the events so this code doesn't need to handle those.
Also from the old commit, it seems the last row has the new vgtid, looks like this won't be the same in the new code, any implications on this behavior change?
Yes, this is the key part that leads to the data loss bug.
Let the current transaction VGTID be n
, previous is n-1
, next transaction is n+1
Previous behavior:
- BEGIN - rotateVgtid - set currentVgtid to
n
, restartVgtid isn-1
- INSERT/UPDATE/DELETE not last operation - no-op on offset context
- INSERT/UPDATE/DELETE that is the last operation - reset vgtid - sets the currentVgtid to be equal to the
n
. Set restartVgtid equal ton
(wrongfully, since it may still have multiple messages to send in case of primary key update (delete, tombstone, create), and can fail part way between them). - COMMIT - rotateVgtid - only does anything if the newVgtid does not equal the currentVgtid, but since the newVgtid is still
n
and the currentVgtid has already been set ton
, this is a no-op and does nothing.
New behavior:
- BEGIN - rotateVgtid - set currentVgtid to
n
, restartVgtid is then-1
- INSERT/UPDATE/DELETE any order, last or not - no op on offset context
- COMMIT - resetVgtid - set current & restart VGTIDs to
n
- the offset will only be committed if the commit event is successfully produced (if tx metadata is enabled) or when it receives a write for then+1
(since the offset vgtid ie the restartVgtid will point ton
) or if it sends a heartbeat event (in the case no tx metadata, and no subsequent writes, that heartbeat will have offset / restartVgtid set ton
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
We found an edge case described in DBZ-8594
There's another possible way to implement this: Modify VitessChangeRecordEmitter - we could modify our implementation to set the offset before emitting the final delete record here. This would be possible but it seemed like a more custom solution, we'd prefer to be as standard as possible. Let me know if you think this is preferred. I think VitessOffsetContext would be called there to resetVgtid (and maybe in create/delete/update functions as well). The thing I don't like is we don't get the bug fixes or changes from the class we inherit from so we have to maintain more custom/overridden logic.
The main downside of this approach is seen in
testCopyTableAndRestart
ie if we do a special gtid (for snapshot or to speed up to current) then after that no other operations happen (no new vgtid's) then the restart vgtid is still the special one (snapshot or fastforward to current) so that operation would get repeated. Perhaps the gudiance we can give users is to enable transaction metadata (so begin/commit events are always sent even if tables are unaffected), but even that is not a guarantee.Couple examples of other connectors I think may be at risk
Postgres
SQL Server