Optimize AccountIndexIterator to reuse last loaded bin-map item range #4729

HaoranYi · 2025-01-31T19:16:26Z

Problem

AccountIndexIterator fetches the items from accounts map in batches, when it is configured "sorted". For each batch, it fetch the result from the underlying bin map by loading, filtering and sorting. However, when a batch only partially consumes a bin map, next batch will start loading/filtering/sorting the same bin map from the beginning again.

Imagining that a bin map contains 10K items, each batch size is 1K, then when we return 10 batches, we will be loading/filtering/sorting the bin-map for 10 times. There is a lot of waste.

We should optimize this by caching the bin-map loaded result in the iterator, so that we only load/filter/sort each bin map at most once.

Summary of Changes

Optimize AccountIndexIterator to reuse last loaded bin-map range.

Also, refactor AccountIndexInterator into two separate iterators - sorted and unsorted. This specialization will improve the IPC of the core loop for both types of iterators.

Fixes #

HaoranYi · 2025-02-10T16:05:08Z

I have been running this PR for the past 3 days on mainnet with out any issues.

HaoranYi · 2025-02-13T16:16:49Z

I pushed a commit to fix the conflicts after rebasing to master.

brooksprumo · 2025-02-14T14:34:30Z

I kinda feel that the sorted vs unsorted iterators are fundamentally different enough where we could (should?) have two separate iterator types and implementations. Wdyt?

HaoranYi · 2025-02-14T14:45:20Z

I kinda feel that the sorted vs unsorted iterators are fundamentally different enough where we could (should?) have two separate iterator types and implementations. Wdyt?

yes. i think it is good idea.
i will refactor them into two iterators.

accounts-db/src/accounts_index.rs

HaoranYi · 2025-02-27T17:42:49Z

@brooksprumo can you help to review this PR again?

brooksprumo · 2025-02-27T18:41:53Z

@brooksprumo can you help to review this PR again?

Sure, will do!

brooksprumo

I need to go through the core of the implementation still. There's some code organization that is making the review a bit challenging for me in Github. I'll pull the code locally and see if that helps.

accounts-db/src/accounts_index.rs

HaoranYi · 2025-02-28T17:53:09Z

@brooksprumo I pushed a commit to implement sorted/unsorted iterator in a different way.
Instead of using an inner class, I used trait with default implementation to share the common code. And I also get rid of the enum wapper, added a new test for unsorted iterator.

Can you take a look again? thanks!

brooksprumo · 2025-02-28T19:24:16Z

@brooksprumo I pushed a commit to implement sorted/unsorted iterator in a different way. Instead of using an inner class, I used trait with default implementation to share the common code. And I also get rid of the enum wapper, added a new test for unsorted iterator.

I like this direction. Can we go one step further and remove AccountsIndexIteratorCommon too? I'm not seeing a lot of benefit here, since both the sorted and unsorted iterators already implement std::iter::Iterator, which should allow them to work interchangeably by the caller. Maybe if there were more shared code then there would be more use default impl within the trait and thus make it more useful? I think mainly I wonder, if we fully duplicate all the code between the two iterators, how much is actually different? I'm thinking maybe not a lot? Wdyt?

HaoranYi · 2025-02-28T19:50:24Z

ok. pushed another change to remove the trait and unshare the common code between two types of iterators.
I keep the refactor of clone_bound and bin_from_bound out of the class method, so that these two can still be shared.

let me know wdyt.

brooksprumo

Wow, ok, I've spent a lot of time staring at all this. Sorry for all the refactor/nit questions/comments. Let me know your thoughts on how to proceed. Some things of the code currently in master feel broken and I'm not sure if/when/where to fix it all.

brooksprumo · 2025-02-28T20:51:23Z

accounts-db/src/accounts_index.rs

+    fn bin_start_and_range(&self) -> (usize, usize) {
+        let start_bin = self.start_bin();
+        // calculate the max range of bins to look in
+        let end_bin_inclusive = self.end_bin_inclusive();
+        let bin_range = if start_bin > end_bin_inclusive {
+            0 // empty range
+        } else if end_bin_inclusive == usize::MAX {
+            usize::MAX
+        } else {
+            // the range is end_inclusive + 1 - start
+            // end_inclusive could be usize::MAX already if no bound was specified
+            end_bin_inclusive.saturating_add(1) - start_bin
+        };
+        (start_bin, bin_range)
+    }


This impl looks identical to me between Sorted and Unsorted. Can this be refactored into a bare function that takes start_bin and end_bin_inclusive as parameters?

Then the next() impls can call the bare function directly.

(We'll also need to do this for hold_range_in_memory(), but that's fine too. This function probably should not be in the iterator interface at all. We can refactor that in a subsequent PR, or here.

Exactly, that's why I tried to introduce the inner class / the trait to share all these functions.
Should we revert this commit and go back with the trait implementation?

done.

moved bin_start_and_range to free fn.

inlined fn range to next.

move hold_range_in_memory to AccountsIndex.

brooksprumo · 2025-02-28T20:53:00Z

accounts-db/src/accounts_index.rs

+    fn start_bin(&self) -> usize {
+        // start in bin where 'start_bound' would exist
+        bin_from_bound(self.bin_calculator, &self.start_bound, 0)
+    }
+
+    fn end_bin_inclusive(&self) -> usize {
+        // end in bin where 'end_bound' would exist
+        bin_from_bound(self.bin_calculator, &self.end_bound, usize::MAX)
+    }


nit: Can the ordering of the methods be the same in Unsorted and Sorted, please? This makes it easier to compare the code for both side by side.

brooksprumo · 2025-02-28T20:56:52Z

accounts-db/src/accounts_index.rs

+    fn range<R>(map: &AccountMaps<T, U>, range: R) -> RangeItemVec<T>
+    where
+        R: RangeBounds<Pubkey> + std::fmt::Debug,
+    {
+        let mut result = map.items(&range);
+        result.sort_unstable_by(|a, b| a.0.cmp(&b.0));
+        result
+    }


Feels like range() here can go too (i.e. be inlined into new()). We only call it in one place, and it doesn't seem all that useful on its own. This is even more true for the impl on Unsorted.

(And then can the RangeItemVec alias be removed too, please?)

done.

But we still need RangeItemVec. it is used in last_bin_range to make clippy happy.

error: very complex type used. Consider factoring parts into `type` definitions --> accounts-db/src/accounts_index.rs:591:21 | 591 | last_bin_range: Option<(usize, Vec<(Pubkey, AccountMapEntry<T>)>)>, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#type_complexity = note: `-D clippy::type-complexity` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::type_complexity)]`

brooksprumo · 2025-02-28T21:22:43Z

accounts-db/src/accounts_index.rs

+            let mut range = Self::range(&map, (self.start_bound, self.end_bound));
+            chunk.append(&mut range);
+        }
+        self.is_finished = true;


Lol, the more I look at the original code the more confused I am. This Unsorted iterator doesn't actually iterate at all... it just returns all the items in the range, and does not respect ITER_BATCH_SIZE at all. There will only ever be a single "iteration" of this iterator.

We probably want the unsorted iterator to respect the batch size though? That or we don't call this thing an iterator... Ugh, that'd be a lot of change that is not really related to this PR. I'm not sure what to do: (1) fix the unsorted iterator in master first, (2) leave the unsorted iterator unchanged in this pr, or (3) fix it within this PR. I'm inclined to think (1) or (2), but not (3). Wdyt?

We may want to put back in the

self.start_bound = Excluded(chunk.last().unwrap().0);

line at the end of the function though. If we do want to respect the batch size in the future, that line will be required.

I prefer 2 to keep the unsorted behavior the same.
We are not seeing any issue with unsorted iter yet. We may want to keep it as it is.

brooksprumo · 2025-02-28T22:23:46Z

accounts-db/src/accounts_index.rs

+        let iter: Box<dyn Iterator<Item = Vec<(Pubkey, AccountMapEntry<T>)>>> =
+            if returns_items == AccountsIndexIteratorReturnsItems::Sorted {
+                Box::new(self.iter_sorted(range.as_ref()))
+            } else {
+                Box::new(self.iter_unsorted(range.as_ref()))
+            };


Ugh... crap... right. If we don't use an enum on the two iter impls, then we need dynamic dispatch, or we need to make a fn around the for loop below with an impl Iterator parameter.

Oh neat, the Either crate handles this!
https://docs.rs/either/latest/either/enum.Either.html#method.iter

(not saying we need to use either, just sharing)

yeah. that's why i used enum for static dispatch in the first place.
But I think it shouldn't matter much for performance to use the dynamic dispatch here.
For unsorted iter, we only call next once. For sorted iter, we call next every 1000 elements.
we should be fine to use dyn dispatch here.

HaoranYi · 2025-03-01T00:53:17Z

pushed one more commit to save two copies of pubkey in hold_range_in memory.

After inlining the hold_range_in memory, we already have a ref of Range. so we can use as_ref to avoid the clone of the pubkeys.

HaoranYi · 2025-03-01T01:51:41Z

one more change on the iterator to take ref of pubkey range instead of copy.

HaoranYi · 2025-03-03T21:48:03Z

After #4729 is merged, we can optimize the account index iterator more by using the lowest_pubkey and highest_pubkey range on the account's index map.

HaoranYi · 2025-03-04T19:53:56Z

After several rounds of the back-and-forth reviewing and discussion, I decided
to restructure the AccountIndex and AccountIndexIterator, and rewrite the
AccountIndexIterator algorithm, instead of keeping the current structure of the
code.

These are the highlight of the changes.

move all the bin function out of AccountIndexIterator to AccountIndex
because AccountIndex owns the bin maps. And at iterator level, we don't know
the total number of binmaps. We are using u32::max instead of the actual binmap
number, which is not accurate and could lead to error if not use carefully.
move hold_range_in_memory out of AccountIndexIterator to AccountIndex
because this function should belong to the iterator.
rewrite the iterator algorithm. The old algorithm has discrepancy between
sorted and unsorted implementation. For unsorted iter, it doesn't do batch,
so it could be memory intensive. For sorted iter, it does do batch but it
redo the load/filter/sort on the binmap. And it was written with unnecessary
complexity, i.e. break/goto. The new algorithm is simpler and clearer. It is
also unified for both sort and unsorted iterators. They both return the items
in batch and avoid redo the load/filter/sort on the binmap. Each binmap is
processed at most once.
minor tweaks in the iterator interface so that it take pubkey range by ref
and avoid cloning the pubkey range.
refactor and add more tests.

jeffwashington · 2025-03-04T20:38:13Z

Sorry - I'm missing context. What is the problem? Soon, we'll stop iterating during rent collection. Who else does these index scans?

jeffwashington · 2025-03-04T20:54:29Z

Sorry - I'm missing context. What is the problem? Soon, we'll stop iterating during rent collection. Who else does these index scans?

On mnb activation schedule: skip rewrites is #3 and stop collecting rent is #4.
However, it looks like we are still scanning/loading all accounts during rent collection even with the activation of these 2 features. Is that accurate?

HaoranYi · 2025-03-04T22:03:39Z

Sorry - I'm missing context. What is the problem? Soon, we'll stop iterating during rent collection. Who else does these index scans?

On mnb activation schedule: skip rewrites is #3 and stop collecting rent is #4. However, it looks like we are still scanning/loading all accounts during rent collection even with the activation of these 2 features. Is that accurate?

Yes, it is correct.
Rent scan may have a impact on a corner case where we update rent_epoch during rent scan. Currently, the rent scan may potentially update the rent_epoch to rent exempted if its rent_epoch is not already rent exempted. That's why we are still required to do rent scan even though we don't collect any rent.

We have another SIMD, which will exclude rent_epoch field changes in rent scan. When that SIMD be active, we can then safely disable rent scan.

Also, this iterator function is used by RPC method to scan accounts from index.

HaoranYi · 2025-03-04T22:06:44Z

These are the relevant SIMDS.
solana-foundation/solana-improvement-documents#230
solana-foundation/solana-improvement-documents#175

HaoranYi changed the title ~~optimize AccountIndexIter to reuse last loaded item range~~ optimize AccountIndexIterator to reuse last loaded bin-map item range Jan 31, 2025

HaoranYi force-pushed the opt_index_iter branch 2 times, most recently from 1b9589c to 4ed42d0 Compare February 5, 2025 21:46

HaoranYi force-pushed the opt_index_iter branch from 4ed42d0 to 1bfd02e Compare February 10, 2025 16:04

HaoranYi requested review from brooksprumo and jeffwashington February 10, 2025 16:06

HaoranYi force-pushed the opt_index_iter branch from 1bfd02e to 0fe2226 Compare February 13, 2025 15:58

HaoranYi commented Feb 14, 2025

View reviewed changes

accounts-db/src/accounts_index.rs Show resolved Hide resolved

HaoranYi force-pushed the opt_index_iter branch from be2f7b3 to 33a74c6 Compare February 14, 2025 19:06

HaoranYi commented Feb 14, 2025

View reviewed changes

accounts-db/src/accounts_index.rs Show resolved Hide resolved

HaoranYi force-pushed the opt_index_iter branch 2 times, most recently from 56078ca to 9cfe2ab Compare February 14, 2025 20:23

HaoranYi force-pushed the opt_index_iter branch from 9cfe2ab to 7ae7553 Compare February 26, 2025 16:30

HaoranYi changed the title ~~optimize AccountIndexIterator to reuse last loaded bin-map item range~~ Optimize AccountIndexIterator to reuse last loaded bin-map item range Feb 27, 2025

brooksprumo reviewed Feb 27, 2025

View reviewed changes

HaoranYi requested a review from brooksprumo February 28, 2025 00:55

brooksprumo reviewed Feb 28, 2025

View reviewed changes

HaoranYi added 4 commits March 1, 2025 00:39

optimize AccountIndexIter to reuse last loaded item range

edb2e10

fix a bug

769701f

clippy

93cc69b

add test

55a0a8c

HaoranYi added 7 commits March 1, 2025 00:39

const

95f5b8b

fix build after rebase

6ade425

refactor to separate accountindexiter into sorted and unsorted

e5210bd

refactor

90801b2

refactor out trait

65c63bb

reorder

ee69c15

more pr refactor

efc835b

HaoranYi force-pushed the opt_index_iter branch from fe3b043 to efc835b Compare March 1, 2025 00:39

use as_ref to save a copy of pubkey range

1c03f84

remove more clone of pubkeys in iterator

67d1383

HaoranYi added 3 commits March 3, 2025 15:22

clippy

9f631a1

refactor flatmap

e0b1aa0

comments

56fca93

rework iter

82d1520

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize AccountIndexIterator to reuse last loaded bin-map item range #4729

Optimize AccountIndexIterator to reuse last loaded bin-map item range #4729

HaoranYi commented Jan 31, 2025 •

edited

Loading

HaoranYi commented Feb 10, 2025

HaoranYi commented Feb 13, 2025 •

edited

Loading

brooksprumo commented Feb 14, 2025

HaoranYi commented Feb 14, 2025

HaoranYi commented Feb 27, 2025

brooksprumo commented Feb 27, 2025

brooksprumo left a comment

HaoranYi commented Feb 28, 2025

brooksprumo commented Feb 28, 2025

HaoranYi commented Feb 28, 2025 •

edited

Loading

brooksprumo left a comment

brooksprumo Feb 28, 2025

HaoranYi Feb 28, 2025

HaoranYi Feb 28, 2025

brooksprumo Feb 28, 2025

HaoranYi Feb 28, 2025

brooksprumo Feb 28, 2025

HaoranYi Feb 28, 2025 •

edited

Loading

brooksprumo Feb 28, 2025

HaoranYi Feb 28, 2025

brooksprumo Feb 28, 2025

brooksprumo Feb 28, 2025 •

edited

Loading

HaoranYi Feb 28, 2025 •

edited

Loading

HaoranYi commented Mar 1, 2025

HaoranYi commented Mar 1, 2025

HaoranYi commented Mar 3, 2025

HaoranYi commented Mar 4, 2025 •

edited

Loading

jeffwashington commented Mar 4, 2025

jeffwashington commented Mar 4, 2025 •

edited

Loading

HaoranYi commented Mar 4, 2025 •

edited

Loading

HaoranYi commented Mar 4, 2025

Optimize AccountIndexIterator to reuse last loaded bin-map item range #4729

Are you sure you want to change the base?

Optimize AccountIndexIterator to reuse last loaded bin-map item range #4729

Conversation

HaoranYi commented Jan 31, 2025 • edited Loading

Problem

Summary of Changes

HaoranYi commented Feb 10, 2025

HaoranYi commented Feb 13, 2025 • edited Loading

brooksprumo commented Feb 14, 2025

HaoranYi commented Feb 14, 2025

HaoranYi commented Feb 27, 2025

brooksprumo commented Feb 27, 2025

brooksprumo left a comment

Choose a reason for hiding this comment

HaoranYi commented Feb 28, 2025

brooksprumo commented Feb 28, 2025

HaoranYi commented Feb 28, 2025 • edited Loading

brooksprumo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HaoranYi Feb 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brooksprumo Feb 28, 2025 • edited Loading

Choose a reason for hiding this comment

HaoranYi Feb 28, 2025 • edited Loading

Choose a reason for hiding this comment

HaoranYi commented Mar 1, 2025

HaoranYi commented Mar 1, 2025

HaoranYi commented Mar 3, 2025

HaoranYi commented Mar 4, 2025 • edited Loading

jeffwashington commented Mar 4, 2025

jeffwashington commented Mar 4, 2025 • edited Loading

HaoranYi commented Mar 4, 2025 • edited Loading

HaoranYi commented Mar 4, 2025

HaoranYi commented Jan 31, 2025 •

edited

Loading

HaoranYi commented Feb 13, 2025 •

edited

Loading

HaoranYi commented Feb 28, 2025 •

edited

Loading

HaoranYi Feb 28, 2025 •

edited

Loading

brooksprumo Feb 28, 2025 •

edited

Loading

HaoranYi Feb 28, 2025 •

edited

Loading

HaoranYi commented Mar 4, 2025 •

edited

Loading

jeffwashington commented Mar 4, 2025 •

edited

Loading

HaoranYi commented Mar 4, 2025 •

edited

Loading