Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip]perf: cache block related data in ChainStore #13006

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

mooori
Copy link
Contributor

@mooori mooori commented Feb 27, 2025

ChainStore queries currently go to rocksdb, as described in #12997. In particular the queries triggered by check_transaction_validity_period account for 21% in the profile.

Currently this PR is a proof of concept to validate and measure performance improvements of adding a cache. On my local workstation I observed the following improvements for apply-range with native transfers (vs the recent commit on master this PR is based on):

  • With 100 accounts: 29% increase from 60k to 77.5k TPS
  • With 10k accounts: 6% increase from 17.1k to 18.2k

Next steps

  • Verify it improves apply-range performance
  • Check native-transfer benchmark performance. Due to other bottlenecks it might not lead to an improvement, but at least performance shouldn't degrade.
  • Add metrics for: distance of queried heights to head, cache hits/misses.
  • Run it for a while under synthetic and under mainnet traffic to observe metrics.
  • Based on observed metrics, pick a cache size/range. Make ChainStoreUpdate::finalize trigger cache cleanup.
  • Integrate with garbage collection.
    • Add plumbing to clear caches automatically in gc_col
  • Add/extend tests.
  • Check if there other are scenarios in which block related data may update on disk. If any, make them update/clear the cache.
  • There are more get operations that could use caching (get_chunk_extra, get_ligth_client_block, ...). If profiles show they take up a lot of time, consider caching them too.
  • Make the apply-range command populate the cache on ChainStore construction (it doesn't call ChainStoreUpdate::finalize).

Comment on lines 290 to 292
block_cache: RefCell<HashMap<CryptoHash, Block>>,
block_header_cache: RefCell<HashMap<CryptoHash, BlockHeader>>,
block_hash_by_height_cache: RefCell<HashMap<BlockHeight, CryptoHash>>,
Copy link
Collaborator

@nagisa nagisa Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think it might make more sense to maintain a view of the most recent blocks in memory (pending analysis into access patterns), rather than caching separate queries and trying to implement something like LRU-evicted access-based cache here.

This would among other things potentially make the cache maintenance somewhat more straightforward and easier to reason about. First, because the only place where the block information is written is in ChainStoreUpdate::finalize, so this is where we'd update the cache too. And second is because some stray query (which can happen due to these queries being dependent on the user input) for a block from yesterday ago wouldn't affect the happy path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than caching separate queries

After making ChainStoreUpdate::finalize update the cache, I tried to return cache.block(h).header() from get_block_header. That made a lot of tests fail and turns out some columns get special treatment in finalize or garbage collection. So the safest way might be to have a separate cache for each DBCol, even if it's a bit of a pain to handle. See the comment on ChainStoreCache.

The apply-range command doesn't trigger ChainStoreUpdate::finalize. I suppose finding a way to do it manually or to otherwise populate the caches for apply-range shouldn't be too hard. As a temporary workaround I'm inserting cache-missed data in get_block, ... into the cache. More details in the comments.

Copy link

codecov bot commented Feb 27, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.91%. Comparing base (0c499cb) to head (b27b806).
Report is 12 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #13006      +/-   ##
==========================================
+ Coverage   69.73%   69.91%   +0.18%     
==========================================
  Files         859      859              
  Lines      175933   176279     +346     
  Branches   175933   176279     +346     
==========================================
+ Hits       122683   123250     +567     
+ Misses      48086    47870     -216     
+ Partials     5164     5159       -5     
Flag Coverage Δ
backward-compatibility 0.36% <0.00%> (-0.01%) ⬇️
db-migration 0.36% <0.00%> (-0.01%) ⬇️
genesis-check 1.43% <0.00%> (-0.01%) ⬇️
linux 69.45% <100.00%> (+0.34%) ⬆️
linux-nightly 69.47% <100.00%> (+0.17%) ⬆️
pytests 1.74% <0.00%> (-0.01%) ⬇️
sanity-checks 1.55% <0.00%> (-0.01%) ⬇️
unittests 69.75% <100.00%> (+0.18%) ⬆️
upgradability 0.36% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mooori
Copy link
Contributor Author

mooori commented Feb 28, 2025

ChainStoreUpdate::finalize now updates the cache and garbage collection clears items if required. The performance numbers mentioned in the PR description remain basically unchanged. The tasks for the next steps have been updated.

Comment on lines +295 to +296
// Interior mutability will not be required anymore once the apply-range command takes
// care of populating the cache. See comments below regarding apply-range not calling finalize.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yeah, apply-range can't finalize as that would advance the chain, which is in direct opposition if what we need it to do when benchmarking. I'd say lets add a method that just fills the caches that gets called from finalize and then separately by apply-range if its in benchmark mode.

Comment on lines +305 to +306
blocks: RefCell<HashMap<CryptoHash, Block>>, // DBCol::Block
block_headers: RefCell<HashMap<CryptoHash, BlockHeader>>, // DBCol::BlockHeader
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: might make sense to look at the callers of get_block etc. to see how frequently they are paired with get_block_header. If that's a common occurrence (definitely happens in check_transaction_validity IIRC) then we can reduce the lookup count by holding HashMap<CryptoHash, (Block, BlockHeader)> and providing a method that looks up both at once.

For storage I would suggest something similar to VecDeque<(CryptoHash, Block, BlockHeader)>. Of course this would require that the view of the chain is always contiguous for the cache.

If ChainStoreCache is made to hold the BlockHeight of the first block in the VecDeque, then block_hashes_by_height becomes a super efficient deque[index + offset].0, while other lookups by hash at least:

  1. always make the same lookup in the same Map<CryptoHash, BlockHeight> before hitting the VecDeque;
  2. whenever there's a lookup for get_block, it would get the related types into a cache line as well for related lookups that can happen in code paths nearby.

All that said this is quite unlikely to make a difference either way in terms of performance (though I would still default to BTreeMap), and the only really important thing we need to resolve here is getting rid of RefCell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants