Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: header downloading get stuck #4459

Merged
merged 2 commits into from
Jul 1, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 10 additions & 7 deletions src/chain_sync/tipset_syncer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -822,7 +822,7 @@ async fn sync_headers_in_reverse<DB: Blockstore + Sync + Send + 'static>(

#[allow(deprecated)] // Tracking issue: https://github.com/ChainSafe/forest/issues/3157
let wp = WithProgressRaw::new("Downloading headers", total_size as u64);
while pending_tipsets.last().epoch() > until_epoch {
'outer: while pending_tipsets.last().epoch() > until_epoch {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the until_epoch is null, it relies on break 'outer; to break this loop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we avoid the inner/outer loop complexity by refactoring this to smaller functions that could potentially be unit tested? At least the block for tipset in network_tipsets {

Copy link
Contributor Author

@hanabi1224 hanabi1224 Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the 'outer label, how about this

        // breaks the loop when `until_epoch` has been added to `pending_tipsets`
        // or when `until_epoch` is null. Note that in the latter case, the outer
        // while condition is always true, and it relies on this flag to break
        // the loop
        let should_terminate = 'append_tipsets: {
            for tipset in network_tipsets {
                // This could happen when the `until_epoch` is a null epoch
                if tipset.epoch() < until_epoch {
                    break 'append_tipsets true;
                }
                validate_tipset_against_cache(bad_block_cache, tipset.key(), &accepted_blocks)?;
                accepted_blocks.extend(tipset.cids());
                tracker.write().set_epoch(tipset.epoch());
                pending_tipsets.push(tipset);
            }
            false
        };
        if should_terminate {
            break;
        }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's up to you, but I'd still recommend extracting the loop to a separate function. This makes it possible to test and is subjectively easier for the reader of the parent function to grasp.

Copy link
Contributor Author

@hanabi1224 hanabi1224 Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this then? It still highly relies on the comments to explain this piece of code though

// tipsets is sorted by epoch in descending order
// returns true when `until_epoch_inclusive` is overreached
fn for_each_tipset_until_epoch_overreached(
    tipsets: impl IntoIterator<Item = Arc<Tipset>>,
    until_epoch_inclusive: ChainEpoch,
    mut callback: impl FnMut(Arc<Tipset>) -> Result<(), TipsetRangeSyncerError>,
) -> Result<bool, TipsetRangeSyncerError> {
    for tipset in tipsets {
        if tipset.epoch() < until_epoch_inclusive {
            return Ok(true);
        }
        callback(tipset)?;
    }
    Ok(false)
}

        let callback = |tipset: Arc<Tipset>| {
            validate_tipset_against_cache(bad_block_cache, tipset.key(), &accepted_blocks)?;
            accepted_blocks.extend(tipset.cids());
            tracker.write().set_epoch(tipset.epoch());
            pending_tipsets.push(tipset);
            Ok(())
        };
        // Breaks the loop when `until_epoch` is overreached, which happens
        // when there are null tipsets in the queried range.
        // Note that when the `until_epoch` is null, the outer while condition
        // is always true, and it relies on this flag to break the loop
        if for_each_tipset_until_epoch_overreached(network_tipsets, until_epoch, callback)? {
            // Breaks when the `until_epoch` is overreached.
            break;
        }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this has been queued, I will make this refactor in a separate PR and let's discuss there

let oldest_pending_tipset = pending_tipsets.last();
let work_to_be_done = oldest_pending_tipset.epoch() - until_epoch + 1;
wp.set((work_to_be_done - total_size).unsigned_abs());
Expand All @@ -842,17 +842,20 @@ async fn sync_headers_in_reverse<DB: Blockstore + Sync + Send + 'static>(
continue;
}

let epoch_diff = oldest_pending_tipset.epoch() - current_head.epoch();
let window = min(epoch_diff, MAX_TIPSETS_TO_REQUEST as i64);
let window = min(
oldest_pending_tipset.epoch() - until_epoch, // (oldest_pending_tipset.epoch() - 1) - until_epoch + 1
MAX_TIPSETS_TO_REQUEST as i64,
);
let network_tipsets = network
.chain_exchange_headers(None, oldest_pending_tipset.parents(), window as u64)
.await
.map_err(TipsetRangeSyncerError::NetworkTipsetQueryFailed)?;

for tipset in network_tipsets
.into_iter()
.take_while(|ts| ts.epoch() >= until_epoch)
{
for tipset in network_tipsets {
// This could happen when the `until_epoch` is a null epoch
if tipset.epoch() < until_epoch {
break 'outer;
}
validate_tipset_against_cache(bad_block_cache, tipset.key(), &accepted_blocks)?;
accepted_blocks.extend(tipset.cids());
tracker.write().set_epoch(tipset.epoch());
Expand Down
Loading