-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Worker hangs after polling workflow task queue #631
Comments
We do check that the worker can connect to the namespace using "describe namespace". So you have a token that works on some calls but not others? Is this self-hosted or cloud? The stack trace seems to be showing it raising an exception. Are you sure that |
This issue is two-fold, I reported the one relevant to the SDK here:
The second issue is what i'm reporting here. The worker ends up raising a |
Some client errors we can detect as recoverable. For ones we can't, we still try to recover for a minute before failing the worker. We intentionally fail the worker instead of letting it operate in a failed state on something that is not quickly/obviously recoverable. Can you confirm whether |
I'm not sure if you're referring to something different, but the logs shared in the original bug report show that |
Ah, I see it in the trace now. This is intentional behavior. A fatal error (or at least one we can't tell is non-fatal) that doesn't fix itself after a minute will cause the worker to fail and shutdown instead of pretending to work silently while continuing to fail. You may restart the worker if you wish, though many prefer not to blindly restart but rather investigate. |
In our case the worker is failing after a number of intermittent network issues, so we would rather it restarts instead. Is there any recommendation around how to restarting the worker? I tried a simple |
This should work. Similarly you can consider having whatever is monitoring the process/pod/container do the restart at an outer level. I would recommend at least also alerting or something on fatal worker error or you won't know your worker isn't working. |
I did try the following to retry the worker:
After running for a while, the worker crashed with the following error when trying to restart:
|
A worker is meant for one run/shutdown. If you want to run a new worker you will need to recreate it again. Unfortunately some validation we added is happening before this check, we will fix that. |
What are you really trying to do?
To connect a worker to a Temporal server with an
authorization
header.Describe the bug
Starting a worker with an incorrect token, causing the server to respond with
Request unauthorized
, causes the worker to hang indefinitely.Minimal Reproduction
Output:
Environment/Versions
Additional context
Related issue: #459
The text was updated successfully, but these errors were encountered: