Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split up mempool fetch and mempool write #77

Merged
merged 5 commits into from
Apr 3, 2024

Conversation

jamesdorfman
Copy link

Currently, the mempool's lock is acquired before doing the following steps to update the in-memory version of the mempool:

  1. Fetch the old mempool's TXIDs from disk
  2. Fetch the new mempool's TXIDs over RPC
  3. Fetch the full transactions of the new TXIDs
  4. Write the new transactions to the in-memory transaction index

Right now, all four steps are done while holding a write lock on the mempool.
We suspect that this is causing incoming requests which read from the query's mempool to fail, because steps 1 - 4 together

In reality, step 1 only needs a read lock on the mempool, step 2 and 3 need no lock and only step 4 needs the write lock.
This PR updates the code to only acquire a write lock for step 4.

It also adds logging around each step, so that we can more-easily see were the bottlenecks are when long response times occasionally occur.

Since the lock is dropped between step 1 and step 4, this opens up potential consistency issues. However, step 4 is the only thing which updates the mempool -- no updates come from other threads, so this shouldn't cause any issues.

Copy link
Collaborator

@shesek shesek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Some comments that apply to whole commits and not specific code lines)

7389f42 Connection duration logging

Why is including the time spent sending the reply desirable? (I also commented on that here: #75 (comment))

d2c6ab5 Additional verbose error handling

The error-chain crate's chain_err() method already does chain the original error as the context for the error returned from the closure, so there should be no need to do this manually.

You can use display_chain() where the error is displayed to show its full context (like for example here).

72a90bc Fn duration instrumentation

We already have a mechanism in place to instrument function execution durations using Prometheus. Why introduce another one?

Prometheus has some pretty useful reporting, visualization and querying abilities, and should be better suited and more robust for this. It already covers many of the functions, including some that the log_fn_duration instrumentation was also added to.

The rust-prometheus API is also pretty ergonomic to work with - the timer stops automatically when dropped so that you don't have to calculate durations manually, which also means that you don't have to hold return values in a temporary res variable and can instead return them directly.

Also, it seems like this might be overly verbose - do we really need to measure things like the simple match in get_electrum_height() or the hex string parsing in hash_for_value()? Ideally I'd prefer the code not to be overly cluttered with instrumentations.

93571b9 Using thread::available_parallelism for rayon::ThreadPoolBuilder

Using the number of available cores is rayon's default. Removing the call to ThreadPoolBuilder::num_threads() should have the same effect.

(Note that the documentation does mention that this behavior may change in the future, but IMO we should deal with this when/if we upgrade to a rayon release that changes this. It might be the case that they never do, or that the new default behavior they implement is preferable for us too.)

src/electrum/server.rs Outdated Show resolved Hide resolved
src/bin/electrs.rs Outdated Show resolved Hide resolved
src/new_index/mempool.rs Outdated Show resolved Hide resolved
src/new_index/mempool.rs Outdated Show resolved Hide resolved
src/new_index/mempool.rs Outdated Show resolved Hide resolved
src/bin/electrs.rs Outdated Show resolved Hide resolved
src/new_index/mempool.rs Outdated Show resolved Hide resolved
src/new_index/mempool.rs Show resolved Hide resolved
src/new_index/mempool.rs Outdated Show resolved Hide resolved
src/new_index/mempool.rs Outdated Show resolved Hide resolved
src/bin/electrs.rs Outdated Show resolved Hide resolved
src/new_index/mempool.rs Outdated Show resolved Hide resolved
src/new_index/mempool.rs Outdated Show resolved Hide resolved
@jamesdorfman jamesdorfman force-pushed the isolate-db-code branch 2 times, most recently from b091634 to acc05bb Compare March 27, 2024 22:07
Copy link
Collaborator

@shesek shesek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two tiny nits, all LGTM other than that. Thanks James!

Could you please squash the "address review" commits (leaving the "remove stale mempool txs" commit separate)? Also, can you mention the change regarding the mempool sync startup loop in the commit?

src/new_index/mempool.rs Outdated Show resolved Hide resolved
src/new_index/mempool.rs Outdated Show resolved Hide resolved
@jamesdorfman
Copy link
Author

@shesek Thanks for the review! I believe I've addressed your latest comments. Please let me know if I missed anything / misunderstood.

src/new_index/mempool.rs Outdated Show resolved Hide resolved
src/bin/electrs.rs Outdated Show resolved Hide resolved
src/new_index/mempool.rs Outdated Show resolved Hide resolved
@shesek shesek changed the title Draft: split up mempool fetch and mempool write Split up mempool fetch and mempool write Apr 3, 2024
@shesek shesek merged commit 7c96c3e into Blockstream:new-index Apr 3, 2024
1 check passed
@junderw
Copy link

junderw commented May 1, 2024

See this commit on the PR based off this PR: mempool/electrs@9e0ecad

If someone tried to call broadcast with a tx we are trying to update in between the grabbing of old_txids and before adding the tx in the update call, the tx will already exist and since the history entries are in Vecs you could get duplicate history entries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants