Migration to v1.3.0 can cause missed events when contract listeners are listening from latest #1531

EnriqueL8 · 2024-06-20T20:46:41Z

As part of the v1.3.0 release, we rearchitected the way event streams from the transaction managers are configured against the namespaces in FireFly. FireFly core will now create one event stream per namespace and as such we wrote code to migrate the existing contract listeners on the previous event stream to the corresponding namespace event stream.

Contract listeners have an option of where to listen from where the choices are accepted:

oldest which defaults to the first block
a specific block number
latest which will be start listen from the creation of the listener.

As such when the listeners get recreated as part of the migration, the ones set to latest will now listen from a new latest and in the time where the chain has advance and FireFly is starting up some event could get lost! This can get worse with another bug where if FireFly cannot communicate with the Transaction Manager to create the listener it will give up and never listen to those events!

A way to solve this would be:

Retrieve the previous contract listeners created against the default eventstream
Find the checkpoint value - i.e the block at which this listener has caught up to!
When creating the new contract listener against the namespace listener set the option to listen from the checkpoint!

This was we do not lose events.

Looking at the code, this change is tricker than it sounds because the original change simply at startup creates an event stream and apply the listeners created for the namespace. There is no notion of "migration" so this code will need to introduce some code to get the existing listeners on the "global" event stream and hopefully be able to using some identifier correlate them to the new ones that need to be created.

The text was updated successfully, but these errors were encountered:

peterbroadhurst · 2024-06-21T13:20:09Z

Concrete proposal - which has other benefits:

For FFTM based connectors (EVMConnect) we do the work to ensure FireFly Core and query the latest checkpoint for a listener
We update the migration code to ask the Go blockchain plugin if it can get the checkpoint for any migration of listeners, and do one of these - in order:
- ~~Use the checkpoint if it can be obtained (EVMConnect only)~~
  - see below - unclear this is optimization is feasible, and does not add anything over the next item
- Look up the most recent event dispatched by the connector, and if one is found then require the blockchain plugin to decode the protocolId back into a fromBlock (all connectors - EthConnect/Fabric)
- If no event, then fall back to the original approach

peterbroadhurst · 2024-06-21T20:50:45Z

The listener endpoint on FFTM already has the checkpoint information embedded, so it doesn't look like changes would be needed to the FFTM or EVMConnect codebases for Core to be able to query the checkpoint information:

https://github.com/hyperledger/firefly-transaction-manager/blob/1ac00b5eb37664d73abebd8822ae8439cef3f4bc/pkg/fftm/stream_management.go#L376-L385

peterbroadhurst · 2024-06-23T18:33:25Z

This code confirms existence of custom listeners. Per the comment it's not specific to the migration scenario:

firefly/internal/contracts/manager.go

Lines 156 to 161 in 16aec0b

    
           // Validate all our listeners exist on startup - consistent with the multi-party manager. 
        
           // This means if EVMConnect is cleared, simply a restart of the namespace on core will 
        
           // cause recreation of all the listeners (noting that listeners that were specified to start 
        
           // from latest, will start from the new latest rather than replaying from the block they 
        
           // started from before they were deleted). 
        
           return cm, cm.verifyListeners(ctx)

This calls down to ethereum specific code, which gets the subscription in the context of the e.streamID that is designated to the namespace:

firefly/internal/blockchain/ethereum/ethereum.go

Lines 909 to 913 in 16aec0b

    
           esID := e.streamID[namespace] 
        
           sub, err := e.streams.getSubscription(ctx, subID, okNotFound) 
        
           if err != nil || sub == nil || sub.Stream != esID { 
        
           	return false, nil, core.ContractListenerStatusUnknown, err 
        
           }

So from this I believe that the code for migration is rather leaning on simply the recreation code that was introduced (during the 1.2->1.3 development) for a different reason - the recreation of listeners in EVMConnect if they had been deleted.

Need to dig into what happens to the old listeners that were at a global level 👁️

peterbroadhurst · 2024-06-23T19:10:24Z

Reading through #1388 and the code, it seems like the original event streams are just left and ignored. However, they are not referred to in any way currently. So implementing a read of their checkpoint would be complicated.

So I'm going to investigate locating/decoding/passing the last-event information.

peterbroadhurst · 2024-06-23T19:23:02Z

For MultiParty networks (where used) the FireFly batch pin contract follows the same pattern, and there's even less code change as the namespace was included in the listener name already in 1.2:

firefly/internal/blockchain/ethereum/ethereum.go

Lines 285 to 289 in 16aec0b

    
           streamID, ok := e.streamID[namespace.Name] 
        
           if !ok { 
        
           	return "", i18n.NewError(ctx, coremsgs.MsgInternalServerError, "eventstream ID not found") 
        
           } 
        
           sub, err := e.streams.ensureFireFlySubscription(ctx, namespace.Name, version, ethLocation.Address, contract.FirstEvent, streamID, batchPinEventABI)

peterbroadhurst · 2024-06-23T19:31:04Z

The thing that distinguishes multi-party BatchPin events from all other events in the blockchainevents table, is the lack of a listener_id.

peterbroadhurst · 2024-06-23T19:36:17Z

For Tokens, there is a strong leaning to using either 0 or a specific block number in the code. When a user doesn't specify a block number, it goes to 0 in the token connector before being passed to EVMConnect. There are many cases we see specific block numbers being used to designate the contract installation block for a token. However, currently I don't see any likelihood that token connectors would have been coerced with a blockNumber: "latest".

So I am not going to work on addressing that, as it would mean a change to the /activatepool flow (which I believe gets now called on every re-connect of a namespace to a token connector, against every pool) to look up the last event and provide it as extra information to handle only the latest special case.

I will look to discuss this a little further with @awrichar

peterbroadhurst · 2024-06-24T17:21:59Z

@EnriqueL8 @awrichar - I have submitted PR #1534

Open conversations I'd like your view on this week:

This only addresses the newest/latest case, it does not address the inefficiency of replaying all transactions since block 0 (or another specific block)
- Should we extend the change to always use this logic if the lastProtocolID block is greater-than the registration block?
Token connectors:
- Question outlined in previous comment

EnriqueL8 · 2024-06-25T11:04:57Z

Thanks for all the thinking!

I'm happy with not addressing this for the token connectors based on the above points.
It would be create to solve the inefficiency as catching up from large chain as part of the upgrade can take a long time. Wouldn't it almost always be the case that the lastProtocolID block is greater-than the registration block?

peterbroadhurst · 2024-06-25T17:18:34Z

So just thinking about the implications, it will simply be that the listener in EVMConnect/fabconnect will show a fromBlock that isn't the only you specified on the listener itself.

I think that's justifiable for the benefit you get - should roll up to the release notes for 1.3.1 for sure, and maybe we need a tweak to the reference section that talks about the event bus, to cover this.

Happy to take both those items on.

peterbroadhurst · 2024-06-27T01:44:01Z

I've updated the PR to handle the 0 and explicit block cases for Fabric and Ethereum

EnriqueL8 · 2024-06-27T10:30:00Z

Just added the documentation tag to make sure we document this new behaviour and add it to the release notes

EnriqueL8 · 2024-06-27T10:32:58Z

So just thinking about the implications, it will simply be that the listener in EVMConnect/fabconnect will show a fromBlock that isn't the only you specified on the listener itself.

Agreed, we should just document that as part of upgrade your listeners will get recreated and as part of this it will take the new "latest" block so it does not loose events. We query the checkpoint information from the connector when retrieving a listener so users will see it straight away

EnriqueL8 · 2024-07-11T15:03:10Z

This has been merged!

EnriqueL8 added bug Something isn't working priority labels Jun 20, 2024

peterbroadhurst mentioned this issue Jun 24, 2024

When re-creating listeners in EVMConnect/EthConnect/FabConnect use the last event block #1534

Merged

peterbroadhurst mentioned this issue Jun 24, 2024

Bug at startup where FireFly gives up creating listeners if cannot reach transaction manager #1532

Open

EnriqueL8 added the documentation Improvements or additions to documentation label Jun 27, 2024

EnriqueL8 added this to the 1.3.1 milestone Jul 3, 2024

EnriqueL8 closed this as completed Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migration to v1.3.0 can cause missed events when contract listeners are listening from latest #1531

Migration to v1.3.0 can cause missed events when contract listeners are listening from latest #1531

EnriqueL8 commented Jun 20, 2024

peterbroadhurst commented Jun 21, 2024 •

edited

Loading

peterbroadhurst commented Jun 21, 2024

peterbroadhurst commented Jun 23, 2024

peterbroadhurst commented Jun 23, 2024

peterbroadhurst commented Jun 23, 2024

peterbroadhurst commented Jun 23, 2024

peterbroadhurst commented Jun 23, 2024

peterbroadhurst commented Jun 24, 2024

EnriqueL8 commented Jun 25, 2024 •

edited

Loading

peterbroadhurst commented Jun 25, 2024

peterbroadhurst commented Jun 27, 2024

EnriqueL8 commented Jun 27, 2024

EnriqueL8 commented Jun 27, 2024

EnriqueL8 commented Jul 11, 2024

Migration to v1.3.0 can cause missed events when contract listeners are listening from latest #1531

Migration to v1.3.0 can cause missed events when contract listeners are listening from latest #1531

Comments

EnriqueL8 commented Jun 20, 2024

peterbroadhurst commented Jun 21, 2024 • edited Loading

peterbroadhurst commented Jun 21, 2024

peterbroadhurst commented Jun 23, 2024

peterbroadhurst commented Jun 23, 2024

peterbroadhurst commented Jun 23, 2024

peterbroadhurst commented Jun 23, 2024

peterbroadhurst commented Jun 23, 2024

peterbroadhurst commented Jun 24, 2024

EnriqueL8 commented Jun 25, 2024 • edited Loading

peterbroadhurst commented Jun 25, 2024

peterbroadhurst commented Jun 27, 2024

EnriqueL8 commented Jun 27, 2024

EnriqueL8 commented Jun 27, 2024

EnriqueL8 commented Jul 11, 2024

peterbroadhurst commented Jun 21, 2024 •

edited

Loading

EnriqueL8 commented Jun 25, 2024 •

edited

Loading