fix(resharding): fix Chain::get_shards_to_state_sync() #12617

marcelo-gonzalez · 2024-12-13T08:21:25Z

This function was left unimplemented for the resharding v3 case, so here we implement it. For each shard ID, if there hasn't been a resharding, we say that we want to state sync it if we'll be tracking it next epoch and don't currently track it. If there has been a resharding, we state sync it if the previous condition is true and we aren't currently generating the shard itself as part of resharding.

Since this shards to state sync calculation is made when we preprocess the first block of an epoch, there are 3 relevant epochs here: prev_epoch, current_epoch, next_epoch. When we process the last block of prev_epoch, start_resharding() will be called and will initiate the child shard splitting if we were tracking the parent of the split shards during prev_epoch (this is true even if we don't track either child in current_epoch). So if we aren't tracking a child shard during current_epoch, we still actually don't want to state sync it if we were tracking the parent previously

This function was left unimplemented for the resharding v3 case, so here we implement it. For each shard ID, if there hasn't been a resharding, we say that we want to state sync it if we'll be tracking it next epoch and don't currently track it. If there has been a resharding, we state sync it if the previous condition is true and we aren't currently generating the shard itself as part of resharding. Since this shards to state sync calculation is made when we preprocess the first block of an epoch, there are 3 relevant epochs here: prev_epoch, current_epoch, next_epoch. When we process the last block of prev_epoch, start_resharding() will be called and will initiate the child shard splitting if we were tracking the parent of the split shards during prev_epoch (this is true even if we don't track either child in current_epoch). So if we aren't tracking a child shard during current_epoch, we still actually don't want to state sync it if we were tracking the parent previously

marcelo-gonzalez · 2024-12-13T08:29:51Z

few things:

After landing fix(resharding): wait until child flat storages are split to take snapshots #12589, we can add a test that fails without this PR, where we can just set the epoch length to 10 in test_resharding_v3_shard_shuffling_slower_post_processing_tasks, which makes it fail because one of the nodes (account6) is tracking shard 6 (parent of split shards) in the first 2 epochs, then switches to tracking shard 3 in the next epoch (but will still do the resharding), then wants to track shard 7 in the epoch after that.
In theory we could just have such a node not do the resharding and do state sync instead, but it feels kind of silly. We have the parent state, so might as well calculate the state of child we know we're going to track in the future
On that note, we prob actually shouldn't do the resharding if we know we're not going to track the children next or next next epoch. And maybe same with flat storage and memtries? Basically on the last block of the epoch, do we need to do any of this stuff if we're not going to track any involved shards in the future?
do we have some canonical way of asking "is this shard the child of a resharding?" In this PR thats what im basically trying to get w/ the stuff at the end of should_catch_up_shard()

codecov · 2024-12-13T08:37:10Z

Codecov Report

Attention: Patch coverage is 81.94444% with 13 lines in your changes missing coverage. Please review.

Project coverage is 70.48%. Comparing base (8cae8cc) to head (60185ef).

Files with missing lines	Patch %	Lines
chain/chain/src/chain.rs	81.94%	1 Missing and 12 partials ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #12617   +/-   ##
=======================================
  Coverage   70.48%   70.48%           
=======================================
  Files         845      845           
  Lines      172251   172270   +19     
  Branches   172251   172270   +19     
=======================================
+ Hits       121411   121428   +17     
+ Misses      45723    45720    -3     
- Partials     5117     5122    +5

Flag	Coverage Δ
backward-compatibility	`0.16% <0.00%> (-0.01%)`	⬇️
db-migration	`0.16% <0.00%> (-0.01%)`	⬇️
genesis-check	`1.36% <0.00%> (-0.01%)`	⬇️
linux	`69.36% <65.27%> (-0.01%)`	⬇️
linux-nightly	`70.05% <81.94%> (+0.01%)`	⬆️
pytests	`1.67% <0.00%> (-0.01%)`	⬇️
sanity-checks	`1.48% <0.00%> (-0.01%)`	⬇️
unittests	`70.31% <81.94%> (+<0.01%)`	⬆️
upgradability	`0.20% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wacban

Requesting changes because I think it can be simplified, let me know if I'm missing something.

wacban · 2024-12-13T10:18:26Z

chain/chain/src/chain.rs

@@ -793,6 +793,7 @@ impl Chain {
    fn get_state_sync_info(
        &self,
        me: &Option<AccountId>,
+        prev_prev_hash: &CryptoHash,


mini nit: I slightly dislike mixing parameters type (hash & block), would it make any sense to pass, hash, prev_hash and prev_prev_hash as separate arguments?

Looking below, the other signature is:

header: &BlockHeader prev_hash: CryptoHash, prev_prev_hash: CryptoHash,

And maybe specify in the comment that header is the epoch first

ya agreed, just changed them all to hashes and epoch IDs

wacban · 2024-12-13T10:22:37Z

chain/chain/src/chain.rs

@@ -2424,56 +2426,69 @@ impl Chain {
        shard_tracker: &ShardTracker,
        me: &Option<AccountId>,
        parent_hash: &CryptoHash,


<unrelated rant> Why do we have two names - prev and parent - for the same thing? 😢 </unrelated rant>

wacban · 2024-12-13T10:43:06Z

chain/chain/src/chain.rs

-            shard_tracker.will_care_about_shard(me.as_ref(), parent_hash, shard_id, true);
-        let does_care_about_shard =
-            shard_tracker.care_about_shard(me.as_ref(), parent_hash, shard_id, true);
+    ) -> Result<bool, Error> {


I agree that the old condition is no longer relevant but the logic here seems too complicated.

In V2 we needed to get a second copy of the parent, even the node tracks it, to perform the preprocessing.
will_care_about_shard && (will_shard_layout_change || !does_care_about_shard)

In V3 this should be needed so it feels like the following should be sufficient:
will_care_about_shard && !does_care_about_shard

Please keep in mind that will_care_about_shard is resharding aware. We should aim to keep the resharding specific part of this logic in one place if possible.

In V3 this should be needed so it feels like the following should be sufficient: will_care_about_shard && !does_care_about_shard

So, relative to the current PR, something like:

diff --git a/chain/chain/src/chain.rs b/chain/chain/src/chain.rs index 931f711f8..ac6b3d650 100644 --- a/chain/chain/src/chain.rs +++ b/chain/chain/src/chain.rs @@ -2465,32 +2465,7 @@ impl Chain { if shard_tracker.care_about_shard(me.as_ref(), parent_hash, shard_id, true) { return Ok(false); } - // Now we need to state sync it unless this is a post-resharding child shard whose parent we were tracking in the - // previous epoch, in which case we don't need to state sync because we'll generate the child when we do the resharding - - let shard_layout = epoch_manager.get_shard_layout(epoch_id)?; - let prev_epoch_id = epoch_manager.get_prev_epoch_id_from_prev_block(parent_hash)?; - let prev_shard_layout = epoch_manager.get_shard_layout(&prev_epoch_id)?; - - let resharded = shard_layout != prev_shard_layout; - if !resharded { - return Ok(true); - } - let Some(parent_shard_id) = shard_layout.try_get_parent_shard_id(shard_id)? else { - return Ok(true); - }; - let was_split = parent_shard_id != shard_id; - if !was_split { - return Ok(true); - } - - // Note that here passing `prev_prev_hash` to care_about_shard() will have us check whether we were tracking it in - // the previous epoch, because the current block is the first block of an epoch, so prev_prev_hash is the "parent_hash" - // of the last block of the previous epoch. TODO: consider refactoring these ShardTracker functions to accept an epoch_id - // to make this less tricky. - let splitting_child = - shard_tracker.care_about_shard(me.as_ref(), prev_prev_hash, parent_shard_id, true); - Ok(!splitting_child) + Ok(true) } /// Check if any block with missing chunk is ready to be processed and start processing these blocks

Now that the other PR is merged, we can try the modification to test_resharding_v3_shard_shuffling_slower_post_processing_tasks that I mentioned caused a crash:

diff --git a/integration-tests/src/test_loop/tests/resharding_v3.rs b/integration-tests/src/test_loop/tests/resharding_v3.rs index c04d89ba6..457db7723 100644 --- a/integration-tests/src/test_loop/tests/resharding_v3.rs +++ b/integration-tests/src/test_loop/tests/resharding_v3.rs @@ -985,6 +985,7 @@ fn test_resharding_v3_shard_shuffling_slower_post_processing_tasks() { .track_all_shards(false) .all_chunks_expected(false) .delay_flat_state_resharding(2) + .epoch_length(10) .build(); test_resharding_v3_base(params); }

Now unfortunately this passes because something about the order of the test loop task processing for state sync and resharding catchup has changed between when I wrote that and the current head of master. Without trying to figure it out, we can just hack it to run in the same order as before:

diff --git a/chain/chain/src/flat_storage_resharder.rs b/chain/chain/src/flat_storage_resharder.rs index 83bd481f3..a401c51ce 100644 --- a/chain/chain/src/flat_storage_resharder.rs +++ b/chain/chain/src/flat_storage_resharder.rs @@ -609,6 +609,16 @@ impl FlatStorageResharder { if self.controller.is_cancelled() { return FlatStorageReshardingTaskResult::Cancelled; } + let head = chain_store.head().unwrap(); + let Some(sync_hash) = chain_store.get_current_epoch_sync_hash(&head.epoch_id).unwrap() else { + return FlatStorageReshardingTaskResult::Postponed; + }; + let key = borsh::to_vec(&near_primitives::state_sync::StatePartKey(sync_hash, shard_uid.shard_id(), 0)).unwrap(); + let part = chain_store.store().get(near_store::DBCol::StateParts, &key).unwrap(); + if !part.is_some() { + return FlatStorageReshardingTaskResult::Postponed; + } + info!(target: "resharding", ?shard_uid, "flat storage shard catchup task started"); let metrics = FlatStorageReshardingShardCatchUpMetrics::new(&shard_uid); // Apply deltas and then create the flat storage.

Then (with the above epoch length change), run it and you get:

thread 'test_loop::tests::resharding_v3::test_resharding_v3_shard_shuffling_slower_post_processing_tasks' panicked at chain/chain/src/resharding/resharding_actor.rs:93:17: impossible to recover from a flat storage shard catchup failure!

with this error message at the end of the logs:

5.661s ERROR resharding: flat storage shard catchup delta application failed! shard_uid=s7.v3 err=Other("unexpected resharding catchup flat storage status for s7.v3: Ready(FlatStorageReadyStatus { flat_head: BlockInfo { hash: Gposh9D271WMUy3wt4Re1pXtFViNEdiMEgWyjm8eoYHH, height: 24, prev_hash: 7d7uMfUA8dXHjGcixM88gVVnt7jaak8N5y7gfv7C58af } })")

What's happening here is the problem with account6 I mentioned in the other comment. It's not tracking the child in the current epoch, will track it in the next, but is performing the split because it tracked the parent in the previous epoch.

Please keep in mind that will_care_about_shard is resharding aware. We should aim to keep the resharding specific part of this logic in one place if possible.

Yeah it's a bit messy... maybe we should split somethiing out of this?

wacban · 2024-12-19T10:24:37Z

will initiate the child shard splitting if we were tracking the parent of the split shards during prev_epoch (this is true even if we don't track either child in current_epoch). So if we aren't tracking a child shard during current_epoch, we still actually don't want to state sync it if we were tracking the parent previously

Just for clarity I think you're talking about the following scenario:
T-1 - prev - tracking parent
T+0 - curr - tracking neither of the children
T+1 - next - tracking one or both of the children

That is an interesting case to think about and same even without resharding actually.

The two options we have are:

Discard the state from T-1 and state sync for T+1 as usual. This is the same as if the "hole" of not tracking was longer than 1 epoch. In this case we don't look back and only ever consider curr and next epochs.
Reuse the state that we have from T-1. It feels a bit risky because, as you said, we need to look at prev epoch as well. This is less typical of a pattern in our code and it may potentially lead to some subtle issues imo.

What is your intuition about how those compare? Approach 2 is what you've implemented here. What would it take to make 1) work? Are there any drawbacks to this approach that I overlooked? Which approach would have simpler and cleaner code?

marcelo-gonzalez requested review from wacban and Longarithm December 13, 2024 08:21

marcelo-gonzalez requested a review from a team as a code owner December 13, 2024 08:21

wacban requested changes Dec 13, 2024

View reviewed changes

marcelo-gonzalez added 2 commits December 15, 2024 16:00

Merge remote-tracking branch 'origin/master' into resharding-sync-shards

b8dfc0c

more consistent func args

60185ef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(resharding): fix Chain::get_shards_to_state_sync() #12617

fix(resharding): fix Chain::get_shards_to_state_sync() #12617

marcelo-gonzalez commented Dec 13, 2024

marcelo-gonzalez commented Dec 13, 2024

codecov bot commented Dec 13, 2024 •

edited

Loading

wacban left a comment

wacban Dec 13, 2024

wacban Dec 13, 2024

marcelo-gonzalez Dec 15, 2024

wacban Dec 13, 2024

wacban Dec 13, 2024

marcelo-gonzalez Dec 15, 2024 •

edited

Loading

wacban commented Dec 19, 2024

fix(resharding): fix Chain::get_shards_to_state_sync() #12617

Are you sure you want to change the base?

fix(resharding): fix Chain::get_shards_to_state_sync() #12617

Conversation

marcelo-gonzalez commented Dec 13, 2024

marcelo-gonzalez commented Dec 13, 2024

codecov bot commented Dec 13, 2024 • edited Loading

Codecov Report

wacban left a comment

Choose a reason for hiding this comment

wacban Dec 13, 2024

Choose a reason for hiding this comment

wacban Dec 13, 2024

Choose a reason for hiding this comment

marcelo-gonzalez Dec 15, 2024

Choose a reason for hiding this comment

wacban Dec 13, 2024

Choose a reason for hiding this comment

wacban Dec 13, 2024

Choose a reason for hiding this comment

marcelo-gonzalez Dec 15, 2024 • edited Loading

Choose a reason for hiding this comment

wacban commented Dec 19, 2024

codecov bot commented Dec 13, 2024 •

edited

Loading

marcelo-gonzalez Dec 15, 2024 •

edited

Loading