Optimize storage of data in state.DB #6547

fasmat · 2024-12-16T16:39:16Z

Description

This issue is to keep track of optimizations that can be done regarding the state.db size and access speeds:

Possible performance improvements

Use integer IDs for the references in the database #6027
- using integer IDs and storing the hash they represent only once in a dedicated table can greatly reduce the amount of data that needs to be stored in the DB (4 or 8 bytes int vs 32 bytes hash).
The merkle proof showing that an identity participated in poet round contained within an ATX contains redundant information that is stored in full in the state DB at the moment
- in many cases the full merkle tree the merkle proof was generated from is already stored by the node (since it fetched it from the PoET when generating its own merkle proof/ATX). So any ATX containing a merkle proof for the same root could be stripped for this data since the proof can easily be regenerated from the full merkle tree when the ATX is requested by a peer (who might not have the full tree).
- even if the node does not have the full merkle tree already it can store the merkle proofs in a way where overlapping paths are deduplicated (i.e. the tree is stored with every path that has been seen). this would also in many cases allow to deduplicate data and store less in the DB while still being able to easily reconstruct the proofs if needed for regossip/sync.
- replacing the full merkle proof with just the leaf index and root could save up to 700+ bytes per ATX that is stored locally, or up to 3.5 GiB per epoch
It might make sense to not store blobs for ATXs at all to save even more data
- a lot of data is already extracted from blobs into tables, extracting the remaining information should be much less than the blobs themselves are (especially after deduplicating merkle proofs)
- This has to be examined for performance - re-encoding ATXs from source data might be more costly than reading them from the DB as blob
- On the other hand this would allow to stop supporting older ATX versions (e.g. ATXv1) completely, the handler than only needs to decode one ATX format, just the signature verification algorithm would be different based on if in that epoch originally ATXv1 or ATXv2 ATXs were published

fasmat added technical debt resource/storage area/poet area/atx labels Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize storage of data in state.DB #6547

Optimize storage of data in state.DB #6547

fasmat commented Dec 16, 2024

Optimize storage of data in state.DB #6547

Optimize storage of data in state.DB #6547

Comments

fasmat commented Dec 16, 2024

Description

Possible performance improvements