-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix inherents race condition #1974
Conversation
…o present in `SubspaceLink`
…orkaround from slot worker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense overall! Plz also updates https://github.com/subspace/subspace/blob/1c9a7825267ecab281b97acbc9919f8b47d01645/test/subspace-test-service/src/lib.rs#L843 to reflect the change in domain tests.
@NingLin-P looks like that broke some execution tests, could you look into it? I think there were some implicit assumptions about which blocks are already imported there. |
Yeah, the domain tests broke due to some event ordering assumptions are broke. Before the changes, we made solid assumptions about the ordering of the following events and ensured the previous event must finish before next event (even though they are running is different workers concurrently) :
With the change, event 3 can't guaranteed to be finished before the next slot, thus the operator may skip to produce bundle and further skip to produce domain block because there is no bundle in the consensus block. For now, I don't have a better idea than to revert the change in the domain test 🙁 |
This is not guaranteed to be the very next block.
One of the next blocks will have that bundle, but there is no guarantee which one. I think test was relying on fragine logic it should not really do that, it is not how things work in actual chain anyway. |
It is not fragile, the ordering is guaranteed in
It is how the actual chain work in most case, where each step have seconds of interval to finish. But there are certainly a lot of scenarios not covered by the test because each domain test is focused on and dedicated to a specific case, while testing how the operator handles consensus chain fork we do not care about what happens if there is an empty consensus block in the new fork because empty consensus block is already tested in another test. |
There are several possible solutions here:
I would prefer the first option for sure. While yes, it works in many cases on real chain the way you describe it, it doesn't always work like that. |
This is the exact current approach, the first option will make the test more verbose and complicated because in some tests we have It is impossible to cover all the scenarios and the combination of them in the test (the number is exponential), so we just chose a relatively efficient way. |
Well, I'll be relying on you do to something about it I think. |
I have tested locally that the domain test is fixed with #2087, plz resolve the conflict. |
…ndition # Conflicts: # crates/sc-consensus-subspace/src/slot_worker.rs
@NingLin-P one of the domains tests hanged apparently on Windows and macOS |
I can't reproduce it on my macOS, I have re-run the CI job to see how it goes. |
Are you using nextest as well? |
Yeah, created NingLin-P#8 to debug and see if it can be reproduced in github-host CI runner. |
@NingLin-P same test hanged on |
…ndition # Conflicts: # crates/sc-consensus-subspace/src/lib.rs
@NingLin-P I have pulled latest |
…t in test This commit sending one more notification to the operator to ensure it have received the previous consensus block import notification and ensure it have finish processing it before return. This commit also clean up some workaround introduced previously that no more necessary or is fixed in upstream. Signed-off-by: linning <[email protected]>
dce257b
to
56c07b0
Compare
Sorry for the annoying force push, it is used to remove some commits that I brought to this PR accidentally, the last commit is the only change I pushed which should fix the CI ideally, PTAL. |
According to force-push you just updated some logs (removing a bunch of them from default levels actually). Not sure how it is supposed to fix anything. UPD: Ah, there were other commit before that, okay... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits, let's see if CI passes
Signed-off-by: linning <[email protected]>
Fixes #871, see #871 (comment) for the reason why this was happening.
Note that previously notification about importing block was technically happening AFTER block was imported (but before next block in import queue is imported, which wasn't the only way to get blocks imported), now it is happening BEFORE block is actually imported into Substrate.
So happy to finally understand what is going on there 😌
Code contributor checklist: