Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(resharding): fix Chain::get_shards_to_state_sync() #12617

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

marcelo-gonzalez
Copy link
Contributor

This function was left unimplemented for the resharding v3 case, so here we implement it. For each shard ID, if there hasn't been a resharding, we say that we want to state sync it if we'll be tracking it next epoch and don't currently track it. If there has been a resharding, we state sync it if the previous condition is true and we aren't currently generating the shard itself as part of resharding.

Since this shards to state sync calculation is made when we preprocess the first block of an epoch, there are 3 relevant epochs here: prev_epoch, current_epoch, next_epoch. When we process the last block of prev_epoch, start_resharding() will be called and will initiate the child shard splitting if we were tracking the parent of the split shards during prev_epoch (this is true even if we don't track either child in current_epoch). So if we aren't tracking a child shard during current_epoch, we still actually don't want to state sync it if we were tracking the parent previously

This function was left unimplemented for the resharding v3 case,
so here we implement it. For each shard ID, if there hasn't been a resharding,
we say that we want to state sync it if we'll be tracking it next epoch and don't currently
track it. If there has been a resharding, we state sync it if the previous condition is true
and we aren't currently generating the shard itself as part of resharding.

Since this shards to state sync calculation is made when we preprocess the first block of an epoch,
there are 3 relevant epochs here: prev_epoch, current_epoch, next_epoch. When we process the last block of prev_epoch,
start_resharding() will be called and will initiate the child shard splitting if we were tracking the parent
of the split shards during prev_epoch (this is true even if we don't track either child in current_epoch). So if we aren't
tracking a child shard during current_epoch, we still actually don't want to state sync it if we were tracking the parent previously
@marcelo-gonzalez marcelo-gonzalez requested a review from a team as a code owner December 13, 2024 08:21
@marcelo-gonzalez
Copy link
Contributor Author

few things:

  • After landing fix(resharding): wait until child flat storages are split to take snapshots #12589, we can add a test that fails without this PR, where we can just set the epoch length to 10 in test_resharding_v3_shard_shuffling_slower_post_processing_tasks, which makes it fail because one of the nodes (account6) is tracking shard 6 (parent of split shards) in the first 2 epochs, then switches to tracking shard 3 in the next epoch (but will still do the resharding), then wants to track shard 7 in the epoch after that.
  • In theory we could just have such a node not do the resharding and do state sync instead, but it feels kind of silly. We have the parent state, so might as well calculate the state of child we know we're going to track in the future
  • On that note, we prob actually shouldn't do the resharding if we know we're not going to track the children next or next next epoch. And maybe same with flat storage and memtries? Basically on the last block of the epoch, do we need to do any of this stuff if we're not going to track any involved shards in the future?
  • do we have some canonical way of asking "is this shard the child of a resharding?" In this PR thats what im basically trying to get w/ the stuff at the end of should_catch_up_shard()

Copy link

codecov bot commented Dec 13, 2024

Codecov Report

Attention: Patch coverage is 81.94444% with 13 lines in your changes missing coverage. Please review.

Project coverage is 70.48%. Comparing base (8cae8cc) to head (60185ef).

Files with missing lines Patch % Lines
chain/chain/src/chain.rs 81.94% 1 Missing and 12 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #12617   +/-   ##
=======================================
  Coverage   70.48%   70.48%           
=======================================
  Files         845      845           
  Lines      172251   172270   +19     
  Branches   172251   172270   +19     
=======================================
+ Hits       121411   121428   +17     
+ Misses      45723    45720    -3     
- Partials     5117     5122    +5     
Flag Coverage Δ
backward-compatibility 0.16% <0.00%> (-0.01%) ⬇️
db-migration 0.16% <0.00%> (-0.01%) ⬇️
genesis-check 1.36% <0.00%> (-0.01%) ⬇️
linux 69.36% <65.27%> (-0.01%) ⬇️
linux-nightly 70.05% <81.94%> (+0.01%) ⬆️
pytests 1.67% <0.00%> (-0.01%) ⬇️
sanity-checks 1.48% <0.00%> (-0.01%) ⬇️
unittests 70.31% <81.94%> (+<0.01%) ⬆️
upgradability 0.20% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes because I think it can be simplified, let me know if I'm missing something.

@@ -793,6 +793,7 @@ impl Chain {
fn get_state_sync_info(
&self,
me: &Option<AccountId>,
prev_prev_hash: &CryptoHash,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mini nit: I slightly dislike mixing parameters type (hash & block), would it make any sense to pass, hash, prev_hash and prev_prev_hash as separate arguments?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking below, the other signature is:

header: &BlockHeader
prev_hash: CryptoHash,
prev_prev_hash: CryptoHash,

And maybe specify in the comment that header is the epoch first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya agreed, just changed them all to hashes and epoch IDs

@@ -2424,56 +2426,69 @@ impl Chain {
shard_tracker: &ShardTracker,
me: &Option<AccountId>,
parent_hash: &CryptoHash,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<unrelated rant> Why do we have two names - prev and parent - for the same thing? 😢 </unrelated rant>

shard_tracker.will_care_about_shard(me.as_ref(), parent_hash, shard_id, true);
let does_care_about_shard =
shard_tracker.care_about_shard(me.as_ref(), parent_hash, shard_id, true);
) -> Result<bool, Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the old condition is no longer relevant but the logic here seems too complicated.

In V2 we needed to get a second copy of the parent, even the node tracks it, to perform the preprocessing.
will_care_about_shard && (will_shard_layout_change || !does_care_about_shard)

In V3 this should be needed so it feels like the following should be sufficient:
will_care_about_shard && !does_care_about_shard

Please keep in mind that will_care_about_shard is resharding aware. We should aim to keep the resharding specific part of this logic in one place if possible.

Copy link
Contributor Author

@marcelo-gonzalez marcelo-gonzalez Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In V3 this should be needed so it feels like the following should be sufficient: will_care_about_shard && !does_care_about_shard

So, relative to the current PR, something like:

diff --git a/chain/chain/src/chain.rs b/chain/chain/src/chain.rs
index 931f711f8..ac6b3d650 100644
--- a/chain/chain/src/chain.rs
+++ b/chain/chain/src/chain.rs
@@ -2465,32 +2465,7 @@ impl Chain {
         if shard_tracker.care_about_shard(me.as_ref(), parent_hash, shard_id, true) {
             return Ok(false);
         }
-        // Now we need to state sync it unless this is a post-resharding child shard whose parent we were tracking in the
-        // previous epoch, in which case we don't need to state sync because we'll generate the child when we do the resharding
-
-        let shard_layout = epoch_manager.get_shard_layout(epoch_id)?;
-        let prev_epoch_id = epoch_manager.get_prev_epoch_id_from_prev_block(parent_hash)?;
-        let prev_shard_layout = epoch_manager.get_shard_layout(&prev_epoch_id)?;
-
-        let resharded = shard_layout != prev_shard_layout;
-        if !resharded {
-            return Ok(true);
-        }
-        let Some(parent_shard_id) = shard_layout.try_get_parent_shard_id(shard_id)? else {
-            return Ok(true);
-        };
-        let was_split = parent_shard_id != shard_id;
-        if !was_split {
-            return Ok(true);
-        }
-
-        // Note that here passing `prev_prev_hash` to care_about_shard() will have us check whether we were tracking it in
-        // the previous epoch, because the current block is the first block of an epoch, so prev_prev_hash is the "parent_hash"
-        // of the last block of the previous epoch. TODO: consider refactoring these ShardTracker functions to accept an epoch_id
-        // to make this less tricky.
-        let splitting_child =
-            shard_tracker.care_about_shard(me.as_ref(), prev_prev_hash, parent_shard_id, true);
-        Ok(!splitting_child)
+        Ok(true)
     }
 
     /// Check if any block with missing chunk is ready to be processed and start processing these blocks

Now that the other PR is merged, we can try the modification to test_resharding_v3_shard_shuffling_slower_post_processing_tasks that I mentioned caused a crash:

diff --git a/integration-tests/src/test_loop/tests/resharding_v3.rs b/integration-tests/src/test_loop/tests/resharding_v3.rs
index c04d89ba6..457db7723 100644
--- a/integration-tests/src/test_loop/tests/resharding_v3.rs
+++ b/integration-tests/src/test_loop/tests/resharding_v3.rs
@@ -985,6 +985,7 @@ fn test_resharding_v3_shard_shuffling_slower_post_processing_tasks() {
         .track_all_shards(false)
         .all_chunks_expected(false)
         .delay_flat_state_resharding(2)
+        .epoch_length(10)
         .build();
     test_resharding_v3_base(params);
 }

Now unfortunately this passes because something about the order of the test loop task processing for state sync and resharding catchup has changed between when I wrote that and the current head of master. Without trying to figure it out, we can just hack it to run in the same order as before:

diff --git a/chain/chain/src/flat_storage_resharder.rs b/chain/chain/src/flat_storage_resharder.rs
index 83bd481f3..a401c51ce 100644
--- a/chain/chain/src/flat_storage_resharder.rs
+++ b/chain/chain/src/flat_storage_resharder.rs
@@ -609,6 +609,16 @@ impl FlatStorageResharder {
         if self.controller.is_cancelled() {
             return FlatStorageReshardingTaskResult::Cancelled;
         }
+        let head = chain_store.head().unwrap();
+        let Some(sync_hash) = chain_store.get_current_epoch_sync_hash(&head.epoch_id).unwrap() else {
+            return FlatStorageReshardingTaskResult::Postponed;
+        };
+        let key = borsh::to_vec(&near_primitives::state_sync::StatePartKey(sync_hash, shard_uid.shard_id(), 0)).unwrap();
+        let part = chain_store.store().get(near_store::DBCol::StateParts, &key).unwrap();
+        if !part.is_some() {
+            return FlatStorageReshardingTaskResult::Postponed;
+        }
+
         info!(target: "resharding", ?shard_uid, "flat storage shard catchup task started");
         let metrics = FlatStorageReshardingShardCatchUpMetrics::new(&shard_uid);
         // Apply deltas and then create the flat storage.

Then (with the above epoch length change), run it and you get:

thread 'test_loop::tests::resharding_v3::test_resharding_v3_shard_shuffling_slower_post_processing_tasks' panicked at chain/chain/src/resharding/resharding_actor.rs:93:17:
impossible to recover from a flat storage shard catchup failure!

with this error message at the end of the logs:

 5.661s ERROR resharding: flat storage shard catchup delta application failed! shard_uid=s7.v3 err=Other("unexpected resharding catchup flat storage status for s7.v3: Ready(FlatStorageReadyStatus { flat_head: BlockInfo { hash: Gposh9D271WMUy3wt4Re1pXtFViNEdiMEgWyjm8eoYHH, height: 24, prev_hash: 7d7uMfUA8dXHjGcixM88gVVnt7jaak8N5y7gfv7C58af } })")

What's happening here is the problem with account6 I mentioned in the other comment. It's not tracking the child in the current epoch, will track it in the next, but is performing the split because it tracked the parent in the previous epoch.

Please keep in mind that will_care_about_shard is resharding aware. We should aim to keep the resharding specific part of this logic in one place if possible.

Yeah it's a bit messy... maybe we should split somethiing out of this?

@wacban
Copy link
Contributor

wacban commented Dec 19, 2024

will initiate the child shard splitting if we were tracking the parent of the split shards during prev_epoch (this is true even if we don't track either child in current_epoch). So if we aren't tracking a child shard during current_epoch, we still actually don't want to state sync it if we were tracking the parent previously

Just for clarity I think you're talking about the following scenario:
T-1 - prev - tracking parent
T+0 - curr - tracking neither of the children
T+1 - next - tracking one or both of the children

That is an interesting case to think about and same even without resharding actually.

The two options we have are:

  1. Discard the state from T-1 and state sync for T+1 as usual. This is the same as if the "hole" of not tracking was longer than 1 epoch. In this case we don't look back and only ever consider curr and next epochs.
  2. Reuse the state that we have from T-1. It feels a bit risky because, as you said, we need to look at prev epoch as well. This is less typical of a pattern in our code and it may potentially lead to some subtle issues imo.

What is your intuition about how those compare? Approach 2 is what you've implemented here. What would it take to make 1) work? Are there any drawbacks to this approach that I overlooked? Which approach would have simpler and cleaner code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants