Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit Writes to Main Thread #201

Open
Sword-Smith opened this issue Oct 9, 2024 · 1 comment
Open

Limit Writes to Main Thread #201

Sword-Smith opened this issue Oct 9, 2024 · 1 comment

Comments

@Sword-Smith
Copy link
Member

Currently, the RPC task is allowed (by convention) to acquire write locks on the wallet state. This generates the subtle issue: what if writing fails?

Currently:

            // ensure we write new wallet state out to disk.
            gsm.persist_wallet().await.expect("flushed wallet");

it only crashes the RPC task and not the entire program. If writes fail, we should crash the entire program.

Limiting write permission to the main thread would enable the main thread to terminate the program when it observes a write fail.

This change would imply that every time the RPC task acquires a write lock, it must instead send a write message to the main task through its RPCServerToMain channel.

Co-authored-by: @aszepieniec

@dan-da
Copy link
Collaborator

dan-da commented Oct 11, 2024

Some observations / details:

  1. moving all disk-writes to main task adds complexity. (and cognitive load). How does RPC task ensure each disk write succeeds for example?

  2. RPC-server test-cases as presently written do not execute the main task, only a dummy. So such inter-task dependencies do not get executed by the tests. I ran into this when testing the send() RPC and found that Tx were not being added to the mempool because that was supposed to be done by the main task.

  3. if the goal is to panic/shutdown on disk-write error, that can be achieved without moving writes to main thread. When a tokio task panics this can be detected by the corresponding JoinHandle. So the parent/monitoring task can: (a) ignore the panic (b) handle it somehow and continue, (c) raise the panic and/or shutdown. Presently we do (a) because the select!() in main_loop doesn't monitor the tasks in task_handles array. But we could change that, so that a panic in any sub-task would cause entire program to halt. I would view that approach as preferable. I don't think we should be ignoring panics, for whatever reason.

  1. shutting down on disk-write failure with a useful log message is probably a reasonable thing to do for now. However this can still leave DBs in an inconsistent state, if write to file #1 succeeds and then write to file #2 fails. Long term, I'd like to see true atomic operations, ie changes for a given logical operation are either 100% written or not at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants