-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gocql panics with panic: scylla: <ip>:9042 invalid number of shards
when restarting node with higher resources assigned
#145
gocql panics with panic: scylla: <ip>:9042 invalid number of shards
when restarting node with higher resources assigned
#145
Comments
The comparison that triggers the panic: Lines 399 to 401 in 61be561
occurs with |
It seems like the node has changed the number of shards, but I'll have to double check that this really occured. In any case, the driver should not panic. |
I am working on that, looks like check that is causing panic was there in previous version that worked, I need to do bisect to determine the actual change that broke the test |
Any updates? |
I found the commit that broke the test: 0a990b2, it is quite large, but I managed to narrow down the error search to one file ( |
@sylwiaszunejko any updates? |
apache#1729 fixes the problem. We wanted to wait for upstream to merge it, but it looks like we’ll only merge it to our fork. I will release the next version of gocql soon. |
scylladb/scylla-operator#1528 merged, thanks @avelanarius @sylwiaszunejko ! |
One of Scylla Operator's E2E tests started failing after updating gocql dependency from v1.7.3 to v1.11.1 due to gocql panicking with the following logs:
https://github.com/scylladb/scylla-operator/actions/runs/6178751661/job/16772628746#step:3:3702
I bisected the repository and confirmed that, before we reverted to v1.7.3, the last good commit was
6b310ee0ce1c7a72e4d16555dedf4e1cf7058258
, and that bumping gocql's version was then enough to break the test.The failing test is https://github.com/scylladb/scylla-operator/blob/master/test/e2e/set/scyllacluster/scyllacluster_updates.go.
It was failing quite consistently on our master (GitHub CI node with kubeadm and cri-o) before reverting. I was also able to consistently reproduce it locally with a similar setup.
Debug logs from a local run:
Test scenario:
Now with v1.11.1, gocql is going to panic with
panic: scylla: <node-ip>:9042 invalid number of shards
, which is a regression from v1.7.3.Prerequisites for reproducing:
Steps to reproduce:
Additional context:
The issue never occurred in our presubmits, ran in a different environment with Prow CI, for which we're using GKE nodes for both master and worker nodes. My only guess was a different networking setup: GKE Dataplane V2 is implemented using Cilium, while in our GitHub CI we're using a default cri-o's network configuration.
For this reason I've setup a local kubeadm installation with Cilium v1.14.2 without kube-proxy. Unfortunately the issue reproduced, so it didn't help narrowing it down, but maybe you'll find this information helpful.
What version of Scylla or Cassandra are you using?
ScyllaDB OS 5.2.7
What version of Gocql are you using?
1.11.1
What version of Go are you using?
1.20
Cross reference: scylladb/scylla-operator#1399
@avelanarius please let me know if you need any additional information or if you could use any help with reproducing the issue.
cc @tnozicka @mykaul
The text was updated successfully, but these errors were encountered: