Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set iterator read bounds on deleting column families to avoid very slow syncs #7664

Closed
teor2345 opened this issue Oct 4, 2023 · 5 comments · Fixed by #7732 or #7731
Closed

Set iterator read bounds on deleting column families to avoid very slow syncs #7664

teor2345 opened this issue Oct 4, 2023 · 5 comments · Fixed by #7732 or #7731
Assignees
Labels
A-state Area: State / database changes C-bug Category: This is a bug C-security Category: Security issues C-tech-debt Category: Code maintainability issues I-remote-trigger Remote nodes can make Zebra do something bad I-slow Problems with performance or responsiveness

Comments

@teor2345
Copy link
Contributor

teor2345 commented Oct 4, 2023

Motivation

Iterating on a column family that deletes lots of keys can be very slow.

We fixed some of this code in PRs #7663 and #7392, but we need a general fix to stop this issue happening again.

This currently impacts the utxo_loc_by_transparent_addr_loc column family, and any future uses of iterators with zs_delete(). It can be remotely triggered.

Specifications

The RocksDB documentation recommends specifying lower and upper bounds on iterators:
https://github.com/facebook/rocksdb/wiki/Iterator#iterating-upper-bound-and-lower-bound
https://docs.rs/rocksdb/latest/rocksdb/struct.ReadOptions.html#method.set_iterate_range
https://docs.rs/rocksdb/latest/rocksdb/struct.DBCommon.html#method.iterator_cf_opt

This is how these bugs in other projects...
https://tracker.ceph.com/issues/55324
https://jira.mariadb.org/browse/MDEV-19670

Were fixed in those projects:
https://github.com/ceph/ceph/pull/46096/files
facebook/rocksdb#5403

Complex Code or Requirements

We need to use iterators for some operations, so our alternatives are (in preferred order):

  1. Minimise the number of keys we delete, and how often we delete them (currently only UTXOs require key deletion)
  2. Avoid using iterators on column families where we delete keys (currently only one column family does this, and it won't be necessary after PR fix(security): fix concurrency issues in tree key formats, and CPU usage in genesis tree roots #7392)
  3. If we must use iterators on those column families, set read bounds to minimise the amount of deleted data that is read (this ticket)

If this solution passes our benchmark tests, we should document it in:
https://github.com/ZcashFoundation/zebra/blob/main/book/src/dev/state-db-upgrades.md

Testing

When we implement the test in ticket #7649, it will also cover this change.

Documentation

We should document this issue in the developer state upgrade docs.

Related Work

This is a long-term fix for #7618.

PR #7663 implements alternative 2 for some column families.
PR #7392 implements alternative 1 for some column families.

@teor2345 teor2345 added C-bug Category: This is a bug S-needs-triage Status: A bug report needs triage P-High 🔥 C-security Category: Security issues I-slow Problems with performance or responsiveness A-state Area: State / database changes C-tech-debt Category: Code maintainability issues I-remote-trigger Remote nodes can make Zebra do something bad labels Oct 4, 2023
@mpguerra
Copy link
Contributor

mpguerra commented Oct 9, 2023

Hey team! Please add your planning poker estimate with Zenhub @arya2 @oxarbitrage @teor2345 @upbqdn

@teor2345
Copy link
Contributor Author

This could be useful for making an exclusive end bound:
https://docs.rs/primitive-types/0.12.2/primitive_types/struct.U256.html#method.checked_add

@teor2345
Copy link
Contributor Author

@arya2 I just added this to the ticket, I'm happy to do it (along with a bunch of other state docs):

Documentation

We should document this issue in the developer state upgrade docs.

@mpguerra
Copy link
Contributor

@arya2 and @teor2345 who is taking the lead on this one?

@arya2
Copy link
Contributor

arya2 commented Oct 11, 2023

who is taking the lead on this one?

Teor and I talked through what changes are needed, and I'll be writing them up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-state Area: State / database changes C-bug Category: This is a bug C-security Category: Security issues C-tech-debt Category: Code maintainability issues I-remote-trigger Remote nodes can make Zebra do something bad I-slow Problems with performance or responsiveness
Projects
Archived in project
3 participants