-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent Signer Behavior #159
Comments
Are you seeing the |
The signers do not pick a sentry. The cosigners connect to all of the configured sentries. The sentries pick a remote signer. This is standard tendermint behavior, it will only send sign requests to one of the connected remote signer. |
Thanks so much for your response!
I am not finding the phrase The sentries are fully synced and reporting Your assertion about not receiving signing requests matches expectations I've had regarding these symptoms based on other Github issues I've unearthed in the Tendermint SDK Github repo, unfortunately the trail of breadcrumbs goes a bit dry from there and I'm not sure how to triage further. Would definitely appreciate any counsel you might have to offer.
Ok, so all of my sentries are When I restart one of the of the sentries that isn't spewing logs, the Horcrux cluster does gripe until the sentry comes back into service. Horcrux says it's connected, but the sentry continues to loop a connection timeout:
This does not appear to disrupt signing activity. |
Is your key registered as a validator on the chain? |
Yes. I originally spun up a completely separate testnet sentry node and used its keys to register as a validator, then migrated the keys to Horcrux (not one of the testnet sentries currently in play, in case I did a poor job of communicating that). I still have the old infrastructure laying around, the node is just stopped there from when I migrated it. On the testnet explorer I can see that the validator is registered, active and signing. |
Can you confirm that your key shards have the correct |
I have isolated Secret objects populated for each of the signers, and the |
What is the RTT (ping) between the cosigners to each other, and also the cosigners to the sentries? |
~50ms when traversing Availability Zones, ~5ms when inside the same AZ. |
Did you change the cosigner configuration in |
They were all initialized with the proper The keys are mounted on the filesystem, the volumes themselves are reprovisioned in between attempts/reconfigurations/etc, so there's no disk-based file cache in play as far as I'm aware. NB: The |
Closing this as stale. Please open a new issue if you notice this behavior again on v3.1.0+. |
Hi there,
I'm trying to implement a Horcrux POC that would mimic my Production architecture with Quicksilver Testnet + Cosmovisor. I have three Sentry nodes up and synced and I'm on Horcrux v3. Processes are all running in plain Kubernetes, no RBAC, no network policies, TCP connectivity has all been tested with
telnet
and the processes can all talk to each other just fine.Scenario 1
When I plug each individual Signer into a dedicated Sentry, nothing happens.
Signer configs,
chainNodes.privValAddr
changes depending on the node:Sentry logs:
It appears to be acting like none of the signing requests are going through to the signers. The behavior appears to be expected when signing isn't happening at all based on what I've managed to sift out of Tendermint issues like this one but ultimately I'm not sure.
Signer logs all look the same, the target address of course changes depending on which Signer is pointed at which Sentry:
Scenario 2
If I hand off this config to all the Signers, with all of the Sentries included in the
chainNodes
list:What am I doing wrong? Any help would be super appreciated. If I've missed a glaringly obvious thing in the docs about how this is all supposed to work, feel free to just drop the link on my face. Ideally I'd like to have a full mesh or 1:1 arrangement between Sentries and Signer processes for maximum fault tolerance.
Thanks in advance.
The text was updated successfully, but these errors were encountered: