-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokio runtime being blocked by nitox. #24
Comments
Maybe we could simply enable lock fairness as described here: https://docs.rs/lock_api/0.1.5/lock_api/struct.RwLockWriteGuard.html#method.unlock_fair I think it's worth trying when the lockguard is held for too long. |
Also, I want to apologize for the relative inactivity, we are currently super swamped on R&D here and can't afford extra time to maintain code that actually already works super well for us in production. We'll get back to nitox super hard when we'll be working on backend stuff on the current project :) |
I definitely understand being swamped. I was able to find a workaround, so I'm not blocked by this. As far as the unlock_fair, that won't actually solve the problem. It may potentially reduce the amount of time that the event loop is blocked, but that will only take effect when there are stacked readers and writers on top of an already acquired write. The fundamental problem is that it is a thread blocking abstraction inside of a parent abstraction which requires that nothing block (in order for the abstraction to remain effective). I would sooner reach for However, the channels based approach mentioned above is very simple to implement, and not having to use locks tends to simplify the code. IMHO, it would be the way to go. |
@OtaK also, I'm happy to implement the proposed channels based pattern. I just don't want to go vigilante and implement a bunch of changes which the maintainers don't want |
I've looked a bit into it and:
As much as I like message passing, I think that the overhead it'd create and the introduced complexity are kinda not worth if it's possible to fix this problem by automatically dropping streams with the last option |
@OtaK so, using a The use of channels would be completely opaque to users of the crate as well. The interface could arguably stay almost exactly the same. It would only be used internally for communication between the client instances and the "daemon" type which actually owns the connection. A big benefit of this as well is that we wouldn't have to worry about blocking issues at all. The whole thing would be purely non-blocking. The main reason I am still pushing towards this idea is because the workaround for dealing with code which may periodically block is to spawn it on a separate tokio runtime instance, which can be a bit of a resource overhead. And if the library isn't actually designed at its base to be non-blocking, subtle issues may come up which cause requests to be dropped for no obvious reason, and then days of debugging ensue ☠️ (which is how I discovered this issue). |
Either way, I'm definitely thinking that the stream cancel & streaming support PRs should land first. Maybe even a quick update to the code base overall to cut over to 2018 edition. We can use rust fix for that. After those items land, I'll experiment with both options. One that just updates to using |
I observed some very interesting behavior while using this crate in a few services of mine. I've observed that running multiple tasks on the tokio runtime along with a nitox client will cause the other tasks to never be executed, or receive very minimal execution, while nitox is running.
Of course, first I suspected that it was my code, but I reviewed it thoroughly and did not find anything obvious which would block the runtime. After reviewing the networking code in this crate, it looks like
parking_lot::RwLock
is used as a fairly fundamental building block for the nats client and the connection multiplexer. This is what is causing the blocking.The
parking_lot
crate does provide smaller and more efficient locking mechanisms than thestd::sync
primitives, but they still block the calling thread when attempting to acquire read/write access. The documentation explicitly states this. Takewrite
for example:RwLock
's type def, which useslock_api::RwLock
under the hood. Docs state:The blocking locks are used primarily in the client, net/connection & net/connection_inner.
what do we do about this?
First, I think that this is not a good thing. We definitely do not want nitox to block the event loop, because then it essentially defeats the purpose of using an event loop. Pretty sure we can all agree with that.
As a path forward, I would propose that we use a non-blocking message passing based approach. A pattern which I have used quite successfully in the past. Details below.
message passing instead of locks
daemon
.Result<NatsClient, NatsError>
or the like.The text was updated successfully, but these errors were encountered: