Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(blockchain): modify location and error handling of forkchoiceUpdated engine API calls #2394

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

shotes
Copy link
Contributor

@shotes shotes commented Jan 22, 2025

This PR is to resolve issue #2405

There are several fixes in here:

  • call NotifyForkchoiceUpdate immediately after VerifyAndNotifyNewPayload. This validates that the EL is willing, able, and DOES, update the head of the chain to the proposed payload. This is the primary fix and resolves handling around the ACCEPTED status. This also enforces that errors in the forkchoiceUpdated propagate to consensus and aren't ignored.
  • Fix a bug where an optimistic builder could send a forkchoice update that would attempt to un-finalize a block under one the following conditions (both of these conditions were bugs before the modifications in this PR with the immediate forkchoiceUpdates, but become much easier to accidentally happen with the new changes):
    1. A forkchoice update is sent at height B to set X as the final block and Y as the head. After sending this forkchoice update (in FinalizeBlock), the round does not reach consensus for some reason. Then the optimistic builder also fails a proposal that triggers an optimistic rebuild in rebuildPayloadForRejectedBlock(). This will attempt to set W (the block before X) as the final block, while setting X as the head in order to rebuild a block on X.
    2. The exact same scenario as 1, except instead of rebuilding optimistically, the proposer is attempting to grab an optimistic build, doesn't find one, and then requests a synchronous build via RequestPayloadSync(), which would result in the same issue.

Edit: my concerns about 1 and 2 are incorrect. In the live codebase (not this PR), fork choice updates are only sent in FinalizeBlock, and those block hashes are never to be changed, even if FinalizeBlock is called multiple times.

Copy link

codecov bot commented Jan 22, 2025

Codecov Report

Attention: Patch coverage is 0% with 26 lines in your changes missing coverage. Please review.

Project coverage is 27.11%. Comparing base (419cd66) to head (bf2fdc0).

Files with missing lines Patch % Lines
state-transition/core/state_processor_payload.go 0.00% 19 Missing ⚠️
beacon/blockchain/payload.go 0.00% 3 Missing ⚠️
execution/engine/engine.go 0.00% 2 Missing ⚠️
beacon/validator/block_builder.go 0.00% 1 Missing ⚠️
state-transition/core/state_processor.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2394      +/-   ##
==========================================
+ Coverage   27.08%   27.11%   +0.02%     
==========================================
  Files         351      350       -1     
  Lines       15543    15528      -15     
  Branches       20       20              
==========================================
  Hits         4210     4210              
+ Misses      11130    11115      -15     
  Partials      203      203              
Files with missing lines Coverage Δ
beacon/blockchain/finalize_block.go 0.00% <ø> (ø)
beacon/validator/block_builder.go 0.00% <0.00%> (ø)
state-transition/core/state_processor.go 0.00% <0.00%> (ø)
execution/engine/engine.go 0.00% <0.00%> (ø)
beacon/blockchain/payload.go 0.00% <0.00%> (ø)
state-transition/core/state_processor_payload.go 0.00% <0.00%> (ø)

@shotes shotes force-pushed the fix-ErrAcceptedPayloadStatus-handling branch from b55b546 to 6791ec6 Compare January 23, 2025 04:02
@@ -89,13 +89,9 @@ func (ee *Engine) NotifyForkchoiceUpdate(
switch {
// We do not bubble the error up, since we want to handle it
// in the same way as the other cases.
case errors.IsAny(
err,
engineerrors.ErrAcceptedPayloadStatus,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ErrAcceptedPayloadStatus does not get returned by notifyForkchoiceUpdated engine API calls.

@shotes shotes changed the title WIP: Remove ErrAcceptedPayloadStatus special case fix(blockchain): modify location and error handling of forkchoiceUpdated engine API calls Jan 24, 2025
@shotes shotes marked this pull request as ready for review January 24, 2025 07:35
@shotes shotes requested a review from a team as a code owner January 24, 2025 07:35
engineerrors.ErrAcceptedPayloadStatus,
engineerrors.ErrSyncingPayloadStatus,
):
case errors.Is(err, engineerrors.ErrSyncingPayloadStatus):
Copy link
Contributor Author

@shotes shotes Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the block is syncing, we cannot verify it. This change requires more thought. How does this effect a new node syncing to the chain? Will this break that?

Copy link
Contributor Author

@shotes shotes Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about this more, I think this is right. Please tell me why this is wrong.

  • If the EL is syncing, we cannot verify the block. Therefore we should not give it the OK in a proposal.
  • If we are a simple node catching up to the network, replaying blocks, then FinalizeBlock will be executed sequentially, guaranteeing that the EL always has the previous block. Therefore it should never be possible to get the SYNCING status.
  • In fact, it should be theoretically impossible for this status to happen, as we now ALWAYS send the payload via newPayload right before the forkchoiceUpdated. So it should not be possible for the EL to not have the previous block.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@rezbera rezbera Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the block is syncing, we cannot verify it. This change requires more thought. How does this effect a new node syncing to the chain? Will this break that?

We have had situations where the EL cannot find peers, and as such is stuck in a syncing state. Previously, consensus would continue but now it will error out. If we introduce this, we need to have some sort of indefinite looping that waits until the EL is synced.

Copy link
Contributor

@rezbera rezbera Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are a simple node catching up to the network, replaying blocks, then FinalizeBlock will be executed sequentially, guaranteeing that the EL always has the previous block. Therefore it should never be possible to get the SYNCING status.

Theoretically this is the case but we observed that when syncing full nodes without EL peers, it was returning "SYNCING" status - something to investigate.

Comment on lines +167 to +171
// Since we have single slot finality, the previous block is already
// considered final from cometBFT perspective. The newPayload being
// submitted from this proposal should be set as the new head of the
// chain, and we can update the finalized block with the EL. We send the
// FCU now which fully verifies that the block is VALID or INVALID.
Copy link
Contributor

@rezbera rezbera Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bubbles up to the s.stateProcessor.Transition call which is also called in ProcessProposal.

Previously we would only set the FCU head in the EL after FinalizeBlock as we confirmed with Informal that this was never going to revert.

Now it seems possible that the "head", i.e. "latest" tag could revert given that we are setting it before FinalizeBlock. This would require that anyone waiting for finality must wait for the "finalized" tag in the EL which is friction we would have to discuss with DX

switch err := sp.processExecutionPayload(ctx, st, blk); {
case err == nil:
// keep going with the processing
case errors.Is(err, engineerrors.ErrAcceptedPayloadStatus):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with removal here. this was useless with the comment being inconsistent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants