Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ack broken entry succeed in ensemble change unsetting #3005

Open
kezhuw opened this issue Jan 21, 2022 · 0 comments
Open

Ack broken entry succeed in ensemble change unsetting #3005

kezhuw opened this issue Jan 21, 2022 · 0 comments
Labels

Comments

@kezhuw
Copy link
Member

kezhuw commented Jan 21, 2022

BUG REPORT

Describe the bug
Ack broken entry succeed in ensemble change unsetting.

Besides this, LedgerHandle.pendingAddOps were read and then poped in nested:

  • Read in LedgerHandle.unsetSuccessAndSendWriteRequest.
  • Poped in call chain PendingAddOp.unsetSuccessAndSendWriteRequest to LedgerHandle.sendAddSuccessCallbacks

This guess this is dangerous.

To Reproduce
I created a test case branch in my fork. I repeated the scenario that test built here.

This test constructs a scenario:

  • Ensemble size is two, write/ack quorum size is one. Let's assume
    initial ensemble members are b0 and b1.
  • Write four entries e0, e2, .. e3. The distribution algorithm will
    write e0 and e2 to b0, e1 and e3 to b1.
  • e1 completes its write. Bookie b1 crashes. Ensemble changing initiates.
    e0 completes its write but blocked from success due to ensemble changing.
  • Ensemble changing completes. The ledger will unset success and resend
    write requests if needed.

In this scenario, after ensemble changed, all pending entries should proceed
to success finally. Especially, e1 should unset its success and resend its write.

In test case, e1 was confirmed as completed though its ack set broken by ensemble change.

Expected behavior
Ack broken entry should resend their write request for ack fulfillment again.

Screenshots
None.

Additional context
In the test case branch, I pushed a possible fix commit kezhuw@207a140.

kezhuw added a commit to kezhuw/bookkeeper that referenced this issue Feb 11, 2022
Currently, in `LedgerHandle.unsetSuccessAndSendWriteRequest`,
`LedgerHandle.sendAddSuccessCallbacks` could be called by
`PendingAddOp.unsetSuccessAndSendWriteRequest` **before** all pending
adds evaluated. This will make entries which met ack requirement in old
ensemble but have not evaluated yet succeed in new ensemble.

Fixes apache#3005.
kezhuw added a commit to kezhuw/bookkeeper that referenced this issue Jul 30, 2022
Currently, in `LedgerHandle.unsetSuccessAndSendWriteRequest`,
`LedgerHandle.sendAddSuccessCallbacks` could be called by
`PendingAddOp.unsetSuccessAndSendWriteRequest` **before** all pending
adds evaluated. This will make entries which met ack requirement in old
ensemble but have not evaluated yet succeed in new ensemble.

Fixes apache#3005.
kezhuw added a commit to kezhuw/bookkeeper that referenced this issue Aug 24, 2022
Currently, in `LedgerHandle.unsetSuccessAndSendWriteRequest`,
`LedgerHandle.sendAddSuccessCallbacks` could be called by
`PendingAddOp.unsetSuccessAndSendWriteRequest` **before** all pending
adds evaluated. This will make entries which met ack requirement in old
ensemble but have not evaluated yet succeed in new ensemble.

Fixes apache#3005.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant