Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize storage of data in state.DB #6547

Open
1 task
fasmat opened this issue Dec 16, 2024 · 0 comments
Open
1 task

Optimize storage of data in state.DB #6547

fasmat opened this issue Dec 16, 2024 · 0 comments

Comments

@fasmat
Copy link
Member

fasmat commented Dec 16, 2024

Description

This issue is to keep track of optimizations that can be done regarding the state.db size and access speeds:

Possible performance improvements

  • Use integer IDs for the references in the database #6027
    • using integer IDs and storing the hash they represent only once in a dedicated table can greatly reduce the amount of data that needs to be stored in the DB (4 or 8 bytes int vs 32 bytes hash).
  • The merkle proof showing that an identity participated in poet round contained within an ATX contains redundant information that is stored in full in the state DB at the moment
    • in many cases the full merkle tree the merkle proof was generated from is already stored by the node (since it fetched it from the PoET when generating its own merkle proof/ATX). So any ATX containing a merkle proof for the same root could be stripped for this data since the proof can easily be regenerated from the full merkle tree when the ATX is requested by a peer (who might not have the full tree).
    • even if the node does not have the full merkle tree already it can store the merkle proofs in a way where overlapping paths are deduplicated (i.e. the tree is stored with every path that has been seen). this would also in many cases allow to deduplicate data and store less in the DB while still being able to easily reconstruct the proofs if needed for regossip/sync.
    • replacing the full merkle proof with just the leaf index and root could save up to 700+ bytes per ATX that is stored locally, or up to 3.5 GiB per epoch
  • It might make sense to not store blobs for ATXs at all to save even more data
    • a lot of data is already extracted from blobs into tables, extracting the remaining information should be much less than the blobs themselves are (especially after deduplicating merkle proofs)
    • This has to be examined for performance - re-encoding ATXs from source data might be more costly than reading them from the DB as blob
    • On the other hand this would allow to stop supporting older ATX versions (e.g. ATXv1) completely, the handler than only needs to decode one ATX format, just the signature verification algorithm would be different based on if in that epoch originally ATXv1 or ATXv2 ATXs were published
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant