-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(snapshots): store BlockInfo in Resharding(CatchingUp) status #12651
Conversation
could be worth merging a cleaned up version of this, but a failing test that passes after this PR can be gotten by applying this diff: diff --git a/chain/chain/src/flat_storage_resharder.rs b/chain/chain/src/flat_storage_resharder.rs
index 83bd481f3..a51751088 100644
--- a/chain/chain/src/flat_storage_resharder.rs
+++ b/chain/chain/src/flat_storage_resharder.rs
@@ -71,6 +71,7 @@ pub struct FlatStorageResharder {
runtime: Arc<dyn RuntimeAdapter>,
/// The current active resharding event.
resharding_event: Arc<Mutex<Option<FlatStorageReshardingEventStatus>>>,
+ resharding_hash: Arc<Mutex<Option<CryptoHash>>>,
/// Sender responsible to convey requests to the dedicated resharding actor.
sender: ReshardingSender,
/// Controls cancellation of background processing.
@@ -102,6 +103,7 @@ impl FlatStorageResharder {
Self {
runtime,
resharding_event,
+ resharding_hash: Arc::new(Mutex::new(None)),
sender,
controller,
resharding_config,
@@ -233,6 +235,7 @@ impl FlatStorageResharder {
}
fn set_resharding_event(&self, event: FlatStorageReshardingEventStatus) {
+ *self.resharding_hash.lock().unwrap() = Some(event.resharding_hash());
*self.resharding_event.lock().unwrap() = Some(event);
}
@@ -336,13 +339,13 @@ impl FlatStorageResharder {
}
};
- #[cfg(feature = "test_features")]
- {
- if self.adv_should_delay_task(&resharding_hash, chain_store) {
- info!(target: "resharding", "flat storage shard split task has been artificially postponed!");
- return FlatStorageReshardingTaskResult::Postponed;
- }
- }
+ // #[cfg(feature = "test_features")]
+ // {
+ // if self.adv_should_delay_task(&resharding_hash, chain_store) {
+ // info!(target: "resharding", "flat storage shard split task has been artificially postponed!");
+ // return FlatStorageReshardingTaskResult::Postponed;
+ // }
+ // }
// We know that the resharding block has become final so let's start the real work.
let (parent_shard, split_params) = self
@@ -609,7 +612,15 @@ impl FlatStorageResharder {
if self.controller.is_cancelled() {
return FlatStorageReshardingTaskResult::Cancelled;
}
- info!(target: "resharding", ?shard_uid, "flat storage shard catchup task started");
+ #[cfg(feature = "test_features")]
+ {
+ let resharding_hash = self
+ .resharding_hash.lock().unwrap().unwrap();
+ if self.adv_should_delay_task(&resharding_hash, chain_store) {
+ info!(target: "resharding", "flat storage catchup task has been artificially postponed!");
+ return FlatStorageReshardingTaskResult::Postponed;
+ }
+ }
let metrics = FlatStorageReshardingShardCatchUpMetrics::new(&shard_uid);
// Apply deltas and then create the flat storage.
let apply_result = self.shard_catchup_apply_deltas(shard_uid, chain_store, &metrics);
on top of the first "drop non-kept snapshot columns in tests" commit in this PR and running |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12651 +/- ##
=======================================
Coverage 70.49% 70.49%
=======================================
Files 845 845
Lines 172247 172265 +18
Branches 172247 172265 +18
=======================================
+ Hits 121426 121439 +13
- Misses 45725 45726 +1
- Partials 5096 5100 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider putting BlockInfo
to CatchingUp
status. It is probably easier dependency-wise and implementation-wise.
ya thats prob better. I think when I implemented it the first time I just kind of forgot that it's fine to just change the DB structures since there hasn't been a release in between. PTAL |
#12589 made a change required for making snapshots of child shards that are catching up after resharding, where if we want to take a state snapshot of a child shard's flat storage whose status is
Resharding(CatchingUp)
, then we set it toReady
in the state snapshot. To do this, we need to be able to figure out what the correctBlockInfo
is just from the hash, so we read the info from the block headers column. But this column is not kept in state snapshots (which was previously overlooked in test loop since everything was copied). So fix it by storing aBlockInfo
in theResharding(CatchingUp)
status so we don't need to look these up anymore