Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASSGO-41 Deadlock in refreshDebouncer when reconnection fails #1752

Open
kevinkyyro opened this issue May 22, 2024 · 2 comments · May be fixed by #1767
Open

CASSGO-41 Deadlock in refreshDebouncer when reconnection fails #1752

kevinkyyro opened this issue May 22, 2024 · 2 comments · May be fixed by #1767

Comments

@kevinkyyro
Copy link

What version of Cassandra are you using?

astra-classic

What version of Gocql are you using?

v1.6.0

What version of Go are you using?

1.21

What did you do?

Connection errors, I think due to overload, lead to frequent reconnection attempts and failures

What did you expect to see?

Should retry until connection succeeds

What did you see instead?

Deadlock

498297 goroutine 1324045437 [chan send, 113 minutes]:
498298 github.com/gocql/gocql.(*refreshDebouncer).stop(0xc0b826a7c0)
498299         /go/pkg/mod/github.com/gocql/[email protected]/host_source.go:848 +0x8c
498300 github.com/gocql/gocql.(*Session).Close(0xc03efb0c00)
498301         /go/pkg/mod/github.com/gocql/[email protected]/session.go:494 +0x105
498302 github.com/gocql/gocql.NewSession({{0xc24be58930, 0x3, 0x3}, {0x2ef55cf, 0x5}, 0x4, 0x12a05f200, 0x12a05f200, 0x0, 0x755a, ...})
498303         /go/pkg/mod/github.com/gocql/[email protected]/session.go:180 +0x98d
498304 github.com/gocql/gocql.(*ClusterConfig).CreateSession(...)
498305         /go/pkg/mod/github.com/gocql/[email protected]/cluster.go:289

It looks like a race condition between (*refreshDebouncer).stop() and (*refreshDebouncer).flusher()

  1. stop() acquires d.mu and sets d.stopped to true
  2. flusher() exits the select at the top of the loop and blocks on acquiring d.mu
  3. stop() releases d.mu and tries to write to d.quit
  4. flusher() acquires d.mu and returns because d.stopped is true
  5. stop() is deadlocked because d.quit is unbuffered and the reader has stopped
@joao-r-reis
Copy link
Contributor

I can work on a fix for this but it's likely that it will only get merged when this driver is donated to the ASF (see #1749 (comment) and #1751 )

@joao-r-reis joao-r-reis linked a pull request Jun 6, 2024 that will close this issue
@joao-r-reis
Copy link
Contributor

joao-r-reis commented Jun 6, 2024

By the way @kevinkyyro , this won't really help with gocql not reconnecting, it will just stop the panic from happening. I believe your issue with the reconnection is related to gocql-astra adding a single contact point which causes gocql to fail after 1 connection attempt (see: datastax/gocql-astra#24 )

In any case I've opened a PR to fix this deadlock issue: #1767

@joao-r-reis joao-r-reis changed the title Deadlock in refreshDebouncer when reconnection fails CASSGO-41 Deadlock in refreshDebouncer when reconnection fails Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants