-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If DisableInitialHostLookup, Network Instability Can Cause Prolonged "no hosts available in pool" #1721
Comments
Hi! Thank you for reporting the issue! It seems that we should update the Also I'm curious if we should deprecate the |
If DisableInitialHostLookup is enabled, the Host IDs are random UUIDs. Therefore, on the first ring refresh, remove the hosts and re-add them back and refill the pools. Closes apache#1721
The PR I submitted above seems to solve the problem. Let me know what you think. In general, I think The only possible corner-case I can think of is that if the initial provided hosts are resolvable, but all hosts are in a bad node state, that |
Hello @amcrn , any example of the implementation of a edit: cluster.PoolConfig = gocql.PoolConfig{
HostSelectionPolicy: gocql.SingleHostReadyPolicy(gocql.RoundRobinHostPolicy()),
} |
What version of Cassandra are you using?
Reproducible on 3.x & 4.x
What version of Gocql are you using?
v1.6.0
What version of Go are you using?
1.20
What did you do?
If
DisableInitialHostLookup
is set totrue
, the ID for eachgocql.HostInfo
is set to a random UUID.https://github.com/gocql/gocql/blob/db6d5564dd6843cc08cc1d6c3642612adf94c618/session.go#L244
When
refreshRing(...)
is invoked,system.peers
is queried,returning the new list of Hosts. However, these
gocql.HostInfo
havethe ID set as the real
host_id
fromsystem.peers
.Therefore, when
r.session.ring.addHostIfMissing(h)
is invoked, theID does not match any of the random UUIDs, resulting in
hostIPToUUID
pointing to the new ID, but
hostList
appends.https://github.com/gocql/gocql/blob/db6d5564dd6843cc08cc1d6c3642612adf94c618/ring.go#L95-L101
addHostIfMissing(h)
returns!ok
, resulting inr.session.startPoolFill(h)
being invoked for each Host, which in turncalls
s.pool.addHost(host)
(which checks by Host ID), ands.policy.AddHost(host)
. The Policy'sAddHost
however uses acowHostList
, which defines equality by theConnectAddress
, not theID
!https://github.com/gocql/gocql/blob/b9737ddcadbbe8092b27df4f6ab2e6e9f3cf4c72/host_source.go#L138-L145
This mismatch of checks (ID vs. IP/Addr) results in the Policy
de-duplicating the Host, but the Pool does not. This becomes a problem
if a network flap occurs again. One variant of this problem is
queryExecutor.do(...)
hashostIter()
only able to returnnil
,because
q.policy
's hosts are empty, whereasq.pool
'shostConnPools
contains the random UUID's entry, with
filling
repeatedly being setto
true
onreconnect()
attempts. This effectively starves theability to get an Up host, resulting in non-stop
gocql: no hosts available in pool
until the session is re-createdor the application is restarted.
The workaround being used at the moment is to set
DisableInitialHostLookup
to
false
and usingSingleHostReadyPolicy
(to reduce the connectiontime that having
DisableInitialHostLookup=true
was originally achieving).What did you expect to see?
That when
DisableInitialHostLookup=true
, and a ring refresh occurs, the code correctly de-deduplicates.What did you see instead?
Remaining Pools associated with non-existent UUIDs, which causes side-effects on subsequent network flaps.
The text was updated successfully, but these errors were encountered: