-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid slowdown when using multiple masks #14617
Avoid slowdown when using multiple masks #14617
Conversation
098f683
to
a36203f
Compare
95430a9
to
d39a07c
Compare
a36203f
to
7027feb
Compare
d39a07c
to
12a8428
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I get the high level concepts, but I'm having a hard time following the logic here. Rather than explaining it here on the PR, could you add comments to the mask code to make it clearer? For example, a comment at the top which describes the overall accumulator concept, and some comments on the accumulator fields to explain the current
and next
maps records? The logic around the detachment signals would be useful to have documented as well.
On another note, this entire changeset is unsafe in the event that we update a mask that has children. While I'm pretty sure we don't do this under normal operating conditions within the frontier, we do use masks throughout the codebase, so we would need to carefully audit all usages to make sure we never do this. I'd be more comfortable taking this sort of change if we were to actually refactor the mask interface to be immutable instead of mutable. This gives me some pause in this direction, but I understand the performance gains of this are really good. So I have a few questions:
- Are the performance gains this brings necessary to unblock the issue we are seeing in Berkeley, or are the other performance gains enough?
- Is there another approach we can take here which will optimize the mask chaining without having to introducing a new invariant into the masking interface?
let self_find_hash t address = | ||
assert_is_attached t ; | ||
Map.find !(t.hashes) address | ||
let update_maps ~f t = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: since update_maps
is always taking a function that is mutating fields on the record, it might make more sense to just make the fields storing a maps_t
to be mutable instead, and then each function can just be implemented as an immutable update on that record of maps, allowing update_maps
to hide the mutability away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be implemented this way, I agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nholland94 could you implement this change?
7027feb
to
04e1b02
Compare
12a8428
to
8914406
Compare
} | ||
[@@deriving sexp] | ||
|
||
let maps_copy { accounts; token_owners; hashes; locations } = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just the identity
function? Should we remove this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not. Underlying type contains mutable variables, hence we're recreating variables here
04e1b02
to
4403440
Compare
Problem: 290 layers of masking trees makes processing of transactions to be O(290*logn) where n is the size of a mask. This 290x factor is a significant slowdown Solution: have a mechanism to replace O(290*logn) to something around O(30*logn). This PR is prototype, without logic integrated into frontier handling
ac44c5c
to
d05242c
Compare
!ci-nightly-me |
!ci-nightly-me |
I can basically understand the ideas behind this optimization. You are caching the updates to accounts in Also, before this optimization, we are using remove_account_and_update_hashes to update the current map. So basically when the mask gets commited, the base ledger would notify that mask to remove the accounts in the mask. But it seems like you are using a different ways to manage the accumulated maps (namely the It seems like the
But this would only gets triggered when there's a read hits this mask. So it seems to me that the accumulated maps would be updated every I guess without the This is just my understanding of what's happening here (could be wrong). Can you explain a little bit on how you intend to make it work? @georgeee |
!ci-build-me |
; accumulated = | ||
Option.map t.accumulated ~f:(fun acc -> | ||
{ base = acc.base | ||
; detached_next_signal = detached_parent_signal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be detached_next_signal
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it seems that the logic here is meant to be that we carry the detached_next_signal
through until we hit a layer where we trigger a rotation. Depending on how we use copy
in the code, this could lead to the accumulator never being rotated, causing a memory leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug indeed
Initial implementation got it right, then in 3dba63a I changed it to what it is now, but this was a wrong change
Problem: accumulated solution is heavily relied upon assumption of immutability of a parent ledger/mask. If this assumption is accidentally broken, this might be problematic. Solution: when a conflicting update to parent is detected, log it and reset the accumulated state all down the ancestry.
Reverts a faulty change (original version was correct). This reverts commit 3dba63a.
!ci-nightly-me |
1 similar comment
!ci-nightly-me |
@@ -493,7 +495,6 @@ module Make (Inputs : Inputs_intf.S) = struct | |||
Base.merkle_root ancestor | |||
|
|||
let remove_account_and_update_hashes t location = | |||
t.accumulated <- None ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original thought was that you could just remove this line here. No other changes should be required.
Among all the children of the root breadcrumb. If it's not on the canonical chain, then they would be treated as garbage and those would be collected, so I think there's no need to manually set accumulated
to be None. For the mask on the canonical chain, we don't need to do anything. So remove this would be enough.
What you did here adds an extra bit of complexity to the code and also do the cleanup manually. I am not sure whether this is the best choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After george explained to me, now I understand that this actually does safeguard against possible bug. This does safeguard against possible errors. The newly added logs and cleanup are exactly some "safeguard" for this PR.
@@ -25,6 +25,8 @@ module Make (Inputs : Inputs_intf) = struct | |||
open Inputs | |||
include Base | |||
|
|||
let logger = Logger.create () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we creating a logger here instead of passing one into the functions that need it? It's best to pass the logger around in the code and not create ad-hoc loggers, as we propagate metadata through loggers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried it now and it seems like I'd need to change 30-50 lines of code (I didn't finish). Do you think this is completely necessary?
This change is a safeguard and we don't expect this log to ever occur in real life. If it does, we probably will have means to deduce context of it occurring even with truncated logging context
@@ -144,8 +146,8 @@ module Make (Inputs : Inputs_intf) = struct | |||
let unsafe_preload_accounts_from_parent = | |||
Mask.Attached.unsafe_preload_accounts_from_parent | |||
|
|||
let register_mask t mask = | |||
let attached_mask = Mask.set_parent mask t in | |||
let register_mask ?accumulated t mask = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to use an explicit named argument here instead of an optional argument. It helps to avoid bugs where we fail to thread a value through, and I'm not sure if we call this function frequently enough directly to make it worth avoid the explicit ~accumulated:None
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 32 usages of Maskable.register_mask
(most in tests) and only one usage where a value is actually passed (one in Ledger
module, the only place we actually want this change).
So I'd say avoiding explicit call is worth it.
if not (Mask.Attached.is_committing mask) then ( | ||
Mask.Attached.parent_set_notify mask account ; | ||
let child_uuid = Mask.Attached.get_uuid mask in | ||
Mask.Attached.drop_accumulated mask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need a reformat.
( match accumulated_opt with | ||
| Some { current; next; base; detached_next_signal } | ||
when Option.is_none t.accumulated -> | ||
maps_merge current t.maps ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mutably merges the contents of t.maps
into the values passed in from accumulated_opt
. I don't think that's what we want, and will lead to bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm addressing this one in my update_maps
refactor. I left the other issue I pointed out here till later, as it requires more changes to implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is actually by design. When we construct a value that we pass in accumulated_opt
, we make a copy, so that I saw no need to make another copy here and merging seemed natural.
I'm not against rewriting it, that is: now it's not buggy, but could be easily misused in future.
maps_merge current t.maps ; | ||
maps_merge next t.maps ; | ||
t.accumulated <- Some { current; next; base; detached_next_signal } | ||
| _ -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the None when Option.is_some t.accumulated
case, shouldn't we pull the accumulator from our new parent, and merge our local maps into it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not how I designed this feature.
Now you have to explicitly use the accumulator in set_parent
, otherwise old semantics of doing a local lookup in t.maps
and then making a call to parent would be used.
I was cautious not to use this accumulator semantics in contexts where I probably don't need it (e.g. in tx processing). I'd prefer to do a proper refactoring before making this mechanism fully universal.
Now it functions only for 290 masks of frontier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S. when re-parenting happens, there is no need to merge maps, we can continue using our own maps without invariant being altered.
!ci-build-me |
!ci-nightly-me |
{ t.maps with | ||
accounts = Location_binable.Map.empty | ||
; hashes = Addr.Map.empty | ||
; token_owners = Token_id.Map.empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
locations
is missing
maps_merge current t.maps ; | ||
maps_merge next t.maps ; | ||
t.accumulated <- Some { current; next; base; detached_next_signal } | ||
| _ -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not how I designed this feature.
Now you have to explicitly use the accumulator in set_parent
, otherwise old semantics of doing a local lookup in t.maps
and then making a call to parent would be used.
I was cautious not to use this accumulator semantics in contexts where I probably don't need it (e.g. in tx processing). I'd prefer to do a proper refactoring before making this mechanism fully universal.
Now it functions only for 290 masks of frontier.
maps_merge current t.maps ; | ||
maps_merge next t.maps ; | ||
t.accumulated <- Some { current; next; base; detached_next_signal } | ||
| _ -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S. when re-parenting happens, there is no need to merge maps, we can continue using our own maps without invariant being altered.
!ci-build-me |
!approved-for-mainnet |
!ci-nightly-me |
Problem: testing with
test-apply
tool revealed that using multiple masks leads to 7x-8x slowdown of block application (for 128 txs). This happens because instead of a single logarithmic lookup 290 of such lookups are performed instead.Solution: merge maps state from masks and keep it in the topmost mask. Use them for lookups, reducing 290 map lookups to a single one.
Explain how you tested your changes:
test-apply
toolwhale-1-1
, confirmed performance improvement and normal operation and block productionChecklist: