-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow is Error but taskset node is not Error when Agent pod failed #14200
Comments
So weird, this issue should have been fixed by #12723. Logs as below show that node
In addition, completed taskset nodes in
hello-plugin-2816962999:
boundaryID: hello-plugin
displayName: hello
finishedAt: "2025-02-17T09:53:43Z" # finishedAt is not nil, so it has already been marked as completed.
id: hello-plugin-2816962999
message: Queuing # I have never seen such an error message before, and here should be 'agent pod failed with reason...'
name: hello-plugin.hello
phase: Pending
progress: 0/1
startedAt: "2025-02-17T09:50:11Z"
templateName: hello-plugin
templateScope: local/hello-plugin
type: Plugin |
@jswxstw You're right, the message for hello-plugin-2816962999 has been updated repeatedly, 'agent pod failed with reason...' was overwritten. |
in once operate: |
The workflow would have ended by this point. In what situation would the redundant reconcileTaskSet you mentioned occur?
|
@jswxstw This test modification will trigger the error. All the processes mentioned above happen in once operate.
|
The redundant reconcileTaskSet refers to unnecessarily reconciling the taskSet when woc.taskSet is empty. The correct behavior should be skipping the reconciliation in such cases. |
@Tuilot Would you like to submit a PR to fix this? |
@jswxstw yes |
Pre-requisites
:latest
image tag (i.e.quay.io/argoproj/workflow-controller:latest
) and can confirm the issue still exists on:latest
. If not, I have explained why, in detail, in my description below.What happened? What did you expect to happen?
When I submit the workflow to argo namepsace , the agent pod failed
Then, the workflow turns to Error state, but the taskset node is still Pending.
Version(s)
latest
Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflow that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container
The text was updated successfully, but these errors were encountered: