-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BasicExternalCluster interruption fails in some situations #483
Comments
Looking at a thread dump, it appears as though the hang is in This probably means we need to make that method either throw the interruption or commute it into a |
This change requires more consideration than we can give this close to release. We can re-evaluate later. |
This actually represents a problem in how we handle the acknowledgements around sent messages. Even if we break out of this loop and throw the We may need to create a checked exception which contains the intermediary state of the message, allowing the caller to resume the wait for acks before eventually receiving the Ultimately, I am not sure if we can fully do this while still allowing interruption to be honored, when looking at some of the other parts of the API without adding substantial complexity to exception flow. Cases like |
Adding notes about some options discussed in the last sync-up.
|
We recently saw a test hang and we suspect it is due to how
BasicExternalCluster
handles asynchronous failures of a server.Galvan reported that it called
forceShutdown
but the test client continued (waiting to reconnect to the server).Since
BasicExternalCluster
uses an in-process client, it relies on thread interruption to shut down the client when a server crashes unexpectedly. The interrupt is delivered to the client thread but we suspect that this thread is either in a path which doesn't expect interruption (and drops it) or is in an operation which doesn't expose interruption.We need to construct a test which kills a server at an unexpected time to verify that we correctly handle this case (our existing tests which rely on this do so earlier in the run, prior to creating a connection). From there, we can find the actual underlying bug.
The text was updated successfully, but these errors were encountered: