-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(availability): prevent parallel availability calls #3883
fix(availability): prevent parallel availability calls #3883
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3883 +/- ##
==========================================
+ Coverage 44.83% 45.33% +0.49%
==========================================
Files 265 309 +44
Lines 14620 21789 +7169
==========================================
+ Hits 6555 9877 +3322
- Misses 7313 10854 +3541
- Partials 752 1058 +306 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- When is this happening in practice?
- Besides someone else calling ShareAvailabe over RPC, shouldn't DASer ensure there is only one availability per height at a time?
- This seems like a broader problem for all availabilities and remind me of SyncGetter(:triggered:) we still missing. Shouldn't we do that for all availabilties instead of light?
Sync summary:
You are right, ShareAvailable over RPC can cause inconsistent sample states in the store. While DASer isn't supposed to call availability for the same height in parallel, it can happen. DASer is built so that multiple ShareAvailable calls are possible and doesn't strictly enforce a single call per height.
There are only two types of availability:
This PR adds transaction guarantees for light availability only, which is why the fix is specific to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am fine with this change - I do agree with hlib's comment re extending single flight protection to both calls (for full availability too) even though this circumstance only really happens from external RPC call.
One thing about it though is that we have an instance of this pattern already in go-header where we delegate the flight protection to the caller (syncer) rather than the component itself, but here it is delegated to the component. Just a thought, but we should probably try to choose one pattern.
Just to clarify, full availability doesn’t need this since the issue can’t happen there. In full availability, the state is kept in one place and updated all at once. But in light availability, the state is split between datastore and blockstore and updated separately.
We can’t delegate to the caller here because there are two possible parallel callers from different components: DASer and RPC. Maybe for go-header, if there are potentially multiple callers, it should also handle state access internally instead of relying on the caller. |
# Conflicts: # share/availability/light/availability.go # share/availability/light/availability_test.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utack
Parallel availability calls could simultaneously select different set of uniq coordinates for block sampling. The selection would be overwritten by last call, which would result in more samples in store than needed. It would make light node to do more sampling work than needed and also would create problems for future pruning of additional samples.
Provided unit explains the problem the best. With removed change, it would result in 176 samples in store instead of 16.