Avoid slowdown when using multiple masks #14617

georgeee · 2023-11-27T21:22:13Z

Problem: testing with test-apply tool revealed that using multiple masks leads to 7x-8x slowdown of block application (for 128 txs). This happens because instead of a single logarithmic lookup 290 of such lookups are performed instead.

Solution: merge maps state from masks and keep it in the topmost mask. Use them for lookups, reducing 290 map lookups to a single one.

Explain how you tested your changes:

Measured performance with test-apply tool
Deployed the change on whale-1-1, confirmed performance improvement and normal operation and block production

Checklist:

Dependency versions are unchanged
- Notify Velocity team if dependencies must change in CI
Modified the current draft of release notes with details on what is completed or incomplete within this project
Document code purpose, how to use it
- Mention expected invariants, implicit constraints
Tests were added for the new behavior
- Document test purpose, significance of failures
- Test names should reflect their purpose
All tests pass (CI will check this if you didn't)
Serialized types are in stable-versioned modules
Does this close issues? List them

Closes Slow block production on berkeley #13249

nholland94

I think I get the high level concepts, but I'm having a hard time following the logic here. Rather than explaining it here on the PR, could you add comments to the mask code to make it clearer? For example, a comment at the top which describes the overall accumulator concept, and some comments on the accumulator fields to explain the current and next maps records? The logic around the detachment signals would be useful to have documented as well.

On another note, this entire changeset is unsafe in the event that we update a mask that has children. While I'm pretty sure we don't do this under normal operating conditions within the frontier, we do use masks throughout the codebase, so we would need to carefully audit all usages to make sure we never do this. I'd be more comfortable taking this sort of change if we were to actually refactor the mask interface to be immutable instead of mutable. This gives me some pause in this direction, but I understand the performance gains of this are really good. So I have a few questions:

Are the performance gains this brings necessary to unblock the issue we are seeing in Berkeley, or are the other performance gains enough?
Is there another approach we can take here which will optimize the mask chaining without having to introducing a new invariant into the masking interface?

nholland94 · 2023-11-28T23:55:11Z

src/lib/merkle_mask/masking_merkle_tree.ml

-    let self_find_hash t address =
-      assert_is_attached t ;
-      Map.find !(t.hashes) address
+    let update_maps ~f t =


NIT: since update_maps is always taking a function that is mutating fields on the record, it might make more sense to just make the fields storing a maps_t to be mutable instead, and then each function can just be implemented as an immutable update on that record of maps, allowing update_maps to hide the mutability away.

This can be implemented this way, I agree

@nholland94 could you implement this change?

ghost-not-in-the-shell · 2023-11-30T12:42:01Z

src/lib/merkle_mask/masking_merkle_tree.ml

+    }
+  [@@deriving sexp]
+
+  let maps_copy { accounts; token_owners; hashes; locations } =


Isn't this just the identity function? Should we remove this function?

This is not. Underlying type contains mutable variables, hence we're recreating variables here

Problem: 290 layers of masking trees makes processing of transactions to be O(290*logn) where n is the size of a mask. This 290x factor is a significant slowdown Solution: have a mechanism to replace O(290*logn) to something around O(30*logn). This PR is prototype, without logic integrated into frontier handling

georgeee · 2023-12-02T00:17:53Z

!ci-nightly-me

georgeee · 2023-12-02T00:28:56Z

!ci-nightly-me

ghost-not-in-the-shell · 2023-12-04T12:25:08Z

I can basically understand the ideas behind this optimization. You are caching the updates to accounts in accumulated. But I am not very clear with the changes you did in this commit: 7fbea17 Why we now have a next and current and a detached_next_signal?

Also, before this optimization, we are using remove_account_and_update_hashes to update the current map. So basically when the mask gets commited, the base ledger would notify that mask to remove the accounts in the mask. But it seems like you are using a different ways to manage the accumulated maps (namely the detached_next_signal).

It seems like the accumulated maps only gets updated here:

    let maps_and_ancestor t =
      Option.iter t.accumulated
        ~f:(fun { detached_next_signal; next; base; current = _ } ->
          if Async.Ivar.is_full detached_next_signal then
            t.accumulated <-
              Some
                { next = t.maps
                ; current = next
                ; detached_next_signal = t.detached_parent_signal
                ; base
                } ) ;
      match t.accumulated with
      | Some { current; base; _ } ->
          (current, base)
      | None ->
          (t.maps, get_parent t)

But this would only gets triggered when there's a read hits this mask. So it seems to me that the accumulated maps would be updated every k masks. My understanding is that there would be k masks with accumulated maps and then the root shifted and at this point the top mask would receive this detached_next_signal and change its detched_next_signal to be detached_parent_signal, and then we have another round k masks.

I guess without the detached_next_signal and the current/next rotation, the accumulated map would keep growing as the chain moves on, because when we unset_parent, we just reset the accumulated maps that being unparented and it won't affect its children.

This is just my understanding of what's happening here (could be wrong). Can you explain a little bit on how you intend to make it work? @georgeee

ghost-not-in-the-shell · 2023-12-04T19:16:37Z

!ci-build-me

ghost-not-in-the-shell · 2023-12-04T20:01:02Z

src/lib/merkle_mask/masking_merkle_tree.ml

+      ; accumulated =
+          Option.map t.accumulated ~f:(fun acc ->
+              { base = acc.base
+              ; detached_next_signal = detached_parent_signal


Should this be detached_next_signal instead?

Yeah it seems that the logic here is meant to be that we carry the detached_next_signal through until we hit a layer where we trigger a rotation. Depending on how we use copy in the code, this could lead to the accumulator never being rotated, causing a memory leak.

This is a bug indeed
Initial implementation got it right, then in 3dba63a I changed it to what it is now, but this was a wrong change

Problem: accumulated solution is heavily relied upon assumption of immutability of a parent ledger/mask. If this assumption is accidentally broken, this might be problematic. Solution: when a conflicting update to parent is detected, log it and reset the accumulated state all down the ancestry.

Reverts a faulty change (original version was correct). This reverts commit 3dba63a.

deepthiskumar · 2023-12-05T06:35:13Z

!ci-nightly-me

ghost-not-in-the-shell · 2023-12-05T07:35:47Z

!ci-nightly-me

ghost-not-in-the-shell · 2023-12-05T16:04:55Z

src/lib/merkle_mask/masking_merkle_tree.ml

@@ -493,7 +495,6 @@ module Make (Inputs : Inputs_intf.S) = struct
          Base.merkle_root ancestor

    let remove_account_and_update_hashes t location =
-      t.accumulated <- None ;


My original thought was that you could just remove this line here. No other changes should be required.

Among all the children of the root breadcrumb. If it's not on the canonical chain, then they would be treated as garbage and those would be collected, so I think there's no need to manually set accumulated to be None. For the mask on the canonical chain, we don't need to do anything. So remove this would be enough.

What you did here adds an extra bit of complexity to the code and also do the cleanup manually. I am not sure whether this is the best choice.

After george explained to me, now I understand that this actually does safeguard against possible bug. This does safeguard against possible errors. The newly added logs and cleanup are exactly some "safeguard" for this PR.

nholland94 · 2023-12-06T19:55:02Z

src/lib/merkle_mask/maskable_merkle_tree.ml

@@ -25,6 +25,8 @@ module Make (Inputs : Inputs_intf) = struct
  open Inputs
  include Base

+  let logger = Logger.create ()


Why are we creating a logger here instead of passing one into the functions that need it? It's best to pass the logger around in the code and not create ad-hoc loggers, as we propagate metadata through loggers.

I tried it now and it seems like I'd need to change 30-50 lines of code (I didn't finish). Do you think this is completely necessary?

This change is a safeguard and we don't expect this log to ever occur in real life. If it does, we probably will have means to deduce context of it occurring even with truncated logging context

nholland94 · 2023-12-06T20:04:11Z

src/lib/merkle_mask/maskable_merkle_tree.ml

@@ -144,8 +146,8 @@ module Make (Inputs : Inputs_intf) = struct
  let unsafe_preload_accounts_from_parent =
    Mask.Attached.unsafe_preload_accounts_from_parent

-  let register_mask t mask =
-    let attached_mask = Mask.set_parent mask t in
+  let register_mask ?accumulated t mask =


I think it's better to use an explicit named argument here instead of an optional argument. It helps to avoid bugs where we fail to thread a value through, and I'm not sure if we call this function frequently enough directly to make it worth avoid the explicit ~accumulated:None argument.

There are 32 usages of Maskable.register_mask (most in tests) and only one usage where a value is actually passed (one in Ledger module, the only place we actually want this change).

So I'd say avoiding explicit call is worth it.

nholland94 · 2023-12-06T20:21:54Z

src/lib/merkle_mask/maskable_merkle_tree.ml

+            if not (Mask.Attached.is_committing mask) then (
+              Mask.Attached.parent_set_notify mask account ;
+              let child_uuid = Mask.Attached.get_uuid mask in
+Mask.Attached.drop_accumulated mask;


I think you need a reformat.

nholland94 · 2023-12-06T23:12:21Z

src/lib/merkle_mask/masking_merkle_tree.ml

+    ( match accumulated_opt with
+    | Some { current; next; base; detached_next_signal }
+      when Option.is_none t.accumulated ->
+        maps_merge current t.maps ;


This mutably merges the contents of t.maps into the values passed in from accumulated_opt. I don't think that's what we want, and will lead to bugs.

I'm addressing this one in my update_maps refactor. I left the other issue I pointed out here till later, as it requires more changes to implement.

Well, this is actually by design. When we construct a value that we pass in accumulated_opt, we make a copy, so that I saw no need to make another copy here and merging seemed natural.

I'm not against rewriting it, that is: now it's not buggy, but could be easily misused in future.

nholland94 · 2023-12-06T23:14:50Z

src/lib/merkle_mask/masking_merkle_tree.ml

+        maps_merge current t.maps ;
+        maps_merge next t.maps ;
+        t.accumulated <- Some { current; next; base; detached_next_signal }
+    | _ ->


In the None when Option.is_some t.accumulated case, shouldn't we pull the accumulator from our new parent, and merge our local maps into it?

This is not how I designed this feature.
Now you have to explicitly use the accumulator in set_parent, otherwise old semantics of doing a local lookup in t.maps and then making a call to parent would be used.

I was cautious not to use this accumulator semantics in contexts where I probably don't need it (e.g. in tx processing). I'd prefer to do a proper refactoring before making this mechanism fully universal.

Now it functions only for 290 masks of frontier.

P.S. when re-parenting happens, there is no need to merge maps, we can continue using our own maps without invariant being altered.

nholland94 · 2023-12-06T23:42:41Z

!ci-build-me

nholland94 · 2023-12-06T23:42:46Z

!ci-nightly-me

georgeee · 2023-12-07T10:58:34Z

src/lib/merkle_mask/masking_merkle_tree.ml

+        { t.maps with
+          accounts = Location_binable.Map.empty
+        ; hashes = Addr.Map.empty
+        ; token_owners = Token_id.Map.empty


locations is missing

georgeee · 2023-12-07T11:04:41Z

src/lib/merkle_mask/masking_merkle_tree.ml

+        maps_merge current t.maps ;
+        maps_merge next t.maps ;
+        t.accumulated <- Some { current; next; base; detached_next_signal }
+    | _ ->


This is not how I designed this feature.
Now you have to explicitly use the accumulator in set_parent, otherwise old semantics of doing a local lookup in t.maps and then making a call to parent would be used.

I was cautious not to use this accumulator semantics in contexts where I probably don't need it (e.g. in tx processing). I'd prefer to do a proper refactoring before making this mechanism fully universal.

Now it functions only for 290 masks of frontier.

georgeee · 2023-12-07T11:05:45Z

src/lib/merkle_mask/masking_merkle_tree.ml

+        maps_merge current t.maps ;
+        maps_merge next t.maps ;
+        t.accumulated <- Some { current; next; base; detached_next_signal }
+    | _ ->


P.S. when re-parenting happens, there is no need to merge maps, we can continue using our own maps without invariant being altered.

georgeee · 2023-12-07T16:55:46Z

!ci-build-me

deepthiskumar · 2023-12-07T17:25:58Z

!approved-for-mainnet

deepthiskumar · 2023-12-07T17:27:54Z

!ci-nightly-me

georgeee requested a review from a team as a code owner November 27, 2023 21:22

georgeee force-pushed the feature/ledger-mask-maps branch from 098f683 to a36203f Compare November 28, 2023 18:35

georgeee force-pushed the georgeee/avoid-slowdown-of-multiple-masks branch from 95430a9 to d39a07c Compare November 28, 2023 18:38

georgeee force-pushed the feature/ledger-mask-maps branch from a36203f to 7027feb Compare November 28, 2023 20:36

georgeee force-pushed the georgeee/avoid-slowdown-of-multiple-masks branch from d39a07c to 12a8428 Compare November 28, 2023 20:37

nholland94 reviewed Nov 29, 2023

View reviewed changes

georgeee force-pushed the feature/ledger-mask-maps branch from 7027feb to 04e1b02 Compare November 29, 2023 17:51

georgeee force-pushed the georgeee/avoid-slowdown-of-multiple-masks branch from 12a8428 to 8914406 Compare November 29, 2023 17:52

ghost-not-in-the-shell reviewed Nov 30, 2023

View reviewed changes

georgeee force-pushed the feature/ledger-mask-maps branch from 04e1b02 to 4403440 Compare December 1, 2023 21:08

georgeee added 9 commits December 1, 2023 22:09

Small refactoring in Masking_merkle_tree

c3fcccf

Remove remove_accounts_exn from masking tree

cb22e03

Remove make_space_for

b90eab4

Simplify self_xxx functions

833003b

Integrate mask accumulation into ledger lifecycle

7fbea17

Fix merkle_ledger_tests

1b5779e

Fix bug in Mask's copy

3dba63a

Nit: remove Fn.compose Fn.id

d05242c

georgeee force-pushed the georgeee/avoid-slowdown-of-multiple-masks branch from ac44c5c to d05242c Compare December 1, 2023 21:09

Fix couple of bugs in reparenting

a2e68ac

Avoid force-evaluation-induced bug

4c97874

fix the order of unset_parent when unregister_mask

b6d0391

remove remove_accounts_exn

bbe1860

ghost-not-in-the-shell requested a review from a team as a code owner December 4, 2023 19:16

ghost-not-in-the-shell reviewed Dec 4, 2023

View reviewed changes

georgeee added 4 commits December 5, 2023 01:36

Actualize accumulated in to_accumulated

bacfd17

Add comments about accumulated structure

d325d03

Revert "Fix bug in Mask's copy"

7fdb2a8

Reverts a faulty change (original version was correct). This reverts commit 3dba63a.

Base automatically changed from feature/ledger-mask-maps to feature/sparse-ledger-wide-merkle-paths December 5, 2023 06:32

Base automatically changed from feature/sparse-ledger-wide-merkle-paths to feature/sparse-ledger-of-subset-no-mutability December 5, 2023 06:32

Base automatically changed from feature/sparse-ledger-of-subset-no-mutability to feature/merkle-mask-empty-account-preloading December 5, 2023 06:32

Base automatically changed from feature/merkle-mask-empty-account-preloading to feature/merkle-mask-preloading December 5, 2023 06:32

Base automatically changed from feature/merkle-mask-preloading to feature/improved-merkle-masks-for-staged-ledger December 5, 2023 06:33

Base automatically changed from feature/improved-merkle-masks-for-staged-ledger to feature/batch-account-lookups December 5, 2023 06:33

Base automatically changed from feature/batch-account-lookups to rampup December 5, 2023 06:34

Merge branch 'rampup' into georgeee/avoid-slowdown-of-multiple-masks

ee733d7

ghost-not-in-the-shell reviewed Dec 5, 2023

View reviewed changes

ghost-not-in-the-shell mentioned this pull request Dec 6, 2023

Investigation of long async cycles during initialization on testworld-2-0 #14665

Open

ghost-not-in-the-shell approved these changes Dec 6, 2023

View reviewed changes

nholland94 reviewed Dec 6, 2023

View reviewed changes

Refactor ledger mask update_maps; fix mutation bug in set_parent

e2b4ff6

georgeee commented Dec 7, 2023

View reviewed changes

fixup! Refactor ledger mask update_maps; fix mutation bug in set_parent

4f8bb8e

deepthiskumar merged commit 5c0eb5b into rampup Dec 7, 2023
2 checks passed

deepthiskumar deleted the georgeee/avoid-slowdown-of-multiple-masks branch December 7, 2023 20:27

Avoid slowdown when using multiple masks #14617

Avoid slowdown when using multiple masks #14617

Conversation

georgeee commented Nov 27, 2023

nholland94 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

georgeee commented Dec 2, 2023

georgeee commented Dec 2, 2023

ghost-not-in-the-shell commented Dec 4, 2023 • edited Loading

ghost-not-in-the-shell commented Dec 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deepthiskumar commented Dec 5, 2023

ghost-not-in-the-shell commented Dec 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nholland94 Dec 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

georgeee Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nholland94 commented Dec 6, 2023

nholland94 commented Dec 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

georgeee commented Dec 7, 2023

deepthiskumar commented Dec 7, 2023

deepthiskumar commented Dec 7, 2023

ghost-not-in-the-shell commented Dec 4, 2023 •

edited

Loading

nholland94 Dec 6, 2023 •

edited

Loading

georgeee Dec 7, 2023 •

edited

Loading