Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the Tsavorite checkpoint state machine #1059

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

badrishc
Copy link
Contributor

@badrishc badrishc commented Feb 28, 2025

  • Use a single task (StateMachineDriver.RunStateMachine) to drive the state machine instead of letting user threads cooperatively run it. This can also reduce tail latency of unlucky user operations that need to drive the state machine.
  • Do not let threads work on prepare and in_progress phases at once. The "checkpoint version switch barrer" option is consequently removed as well. This removes the CPR_SHIFT_DETECTED and LatchDestination.Retry code paths.
  • State machines no longer have a OnThreadState component. We also eliminate the ThreadStateMachine step of individual threads. This aspect of checkpointing was complex and errorprone.
  • Remove LightEpoch's Mark and CheckIsComplete as a result of the thread-level simplifications mentioned above.
  • Checkpoints and the state machine driver sit outside the store now, making it possible to drive multiple stores with the same state machine driver. This will allow us to have a single (v) -> (v+1) switch across the string and object stores in Garnet.
  • Removed atomic switch of session context and the maintenance of two session contexts, since sessions no longer have any work/state associated with checkpoint version switches.
  • Currently, as in main, (v) transactions have to end in prepare, before (v+1) transactions can start in in_progress. However, this PR simplifies the future optimization of allowing (v+1) transactions to start as soon as (v) transactions have acquired all their locks. This will significantly reduce the overhead of the barrier during the checkpoint.

@badrishc badrishc marked this pull request as ready for review March 1, 2025 01:22
@badrishc badrishc requested a review from TedHartMS March 2, 2025 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant