Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3.4.9 podGC: onPodCompletion not working properly #11588

Open
2 of 3 tasks
popsu opened this issue Aug 16, 2023 · 19 comments
Open
2 of 3 tasks

v3.4.9 podGC: onPodCompletion not working properly #11588

popsu opened this issue Aug 16, 2023 · 19 comments
Labels
area/controller Controller issues, panics area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more P1 High priority. All bugs with >=5 thumbs up that aren’t P0, plus: Any other bugs deemed high priority type/bug type/regression Regression from previous behavior (a specific type of bug)

Comments

@popsu
Copy link

popsu commented Aug 16, 2023

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

Workflow-controller garbage collection doesn't work properly.

After we updated our Argo-workflows from 3.3.8 to 3.4.9 there was issue with the argo controller failing to remove complete pods. We are using podGC: OnPodCompletion strategy for all workflows in our controller-configmap. Most of the time the pod was removed properly, but maybe 1-5% of the workflows it didn't remove the finished pod properly and the pod just stayed in the cluster as completed.

When looking at the workflow-controller logs there seems to be some issues with the garbage collection, it seems to keep putting same workflow to the garbage collection queue multiple times, even tho it should only be put there once. Now the logs are constantly spamming lines like these, while previously with 3.3.8 this wasn't the case.

I don't get why this workflow is put into the queue multiple times? Once should be enough, and looking at previous version logs that's what was the case.

Here's the number of log lines from the controller from our DataDog (ignore the color changes). The number of log lines was stable low in the previous version, but when we updated to 3.4.9 the number of log lines went up. At the end of the image we decided to go back to 3.3.8 and the number of log lines are back to previous level.

Screenshot from 2023-08-16 14-46-50

Version

v3.4.9

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

-

Logs from the workflow controller

$ kubectl logs workflow-controller-7f8bcdc656-9gvg4 | grep "argo-managed-processor-staging/ca-keystone-feature-parity-testing-q42nt"
time="2023-08-16T09:14:55.884Z" level=info msg="Queueing Succeeded workflow argo-managed-processor-staging/ca-keystone-feature-parity-testing-q42nt for delete in 6h17m37s due to TTL"
time="2023-08-16T09:15:00.118Z" level=info msg="Queueing Succeeded workflow argo-managed-processor-staging/ca-keystone-feature-parity-testing-q42nt for delete in 6h17m32s due to TTL"
time="2023-08-16T09:22:42.304Z" level=info msg="Queueing Succeeded workflow argo-managed-processor-staging/ca-keystone-feature-parity-testing-q42nt for delete in 6h9m50s due to TTL"
time="2023-08-16T09:32:28.796Z" level=info msg="Queueing Succeeded workflow argo-managed-processor-staging/ca-keystone-feature-parity-testing-q42nt for delete in 6h0m4s due to TTL"
time="2023-08-16T09:54:55.019Z" level=info msg="Queueing Succeeded workflow argo-managed-processor-staging/ca-keystone-feature-parity-testing-q42nt for delete in 5h37m37s due to TTL"
time="2023-08-16T09:57:37.509Z" level=info msg="Queueing Succeeded workflow argo-managed-processor-staging/ca-keystone-feature-parity-testing-q42nt for delete in 5h34m55s due to TTL"
time="2023-08-16T10:02:42.185Z" level=info msg="Queueing Succeeded workflow argo-managed-processor-staging/ca-keystone-feature-parity-testing-q42nt for delete in 5h29m50s due to TTL"
time="2023-08-16T10:10:17.360Z" level=info msg="Queueing Succeeded workflow argo-managed-processor-staging/ca-keystone-feature-parity-testing-q42nt for delete in 5h22m15s due to TTL"
time="2023-08-16T10:14:55.166Z" level=info msg="Queueing Succeeded workflow argo-managed-processor-staging/ca-keystone-feature-parity-testing-q42nt for delete in 5h17m37s due to TTL"
time="2023-08-16T10:19:45.878Z" level=info msg="Queueing Succeeded workflow argo-managed-processor-staging/ca-keystone-feature-parity-testing-q42nt for delete in 5h12m47s due to TTL"
time="2023-08-16T10:25:22.294Z" level=info msg="Queueing Succeeded workflow argo-managed-processor-staging/ca-keystone-feature-parity-testing-q42nt for delete in 5h7m10s due to TTL"
time="2023-08-16T10:33:06.901Z" level=info msg="Queueing Succeeded workflow argo-managed-processor-staging/ca-keystone-feature-parity-testing-q42nt for delete in 4h59m26s due to TTL"
time="2023-08-16T10:34:56.043Z" level=info msg="Queueing Succeeded workflow argo-managed-processor-staging/ca-keystone-feature-parity-testing-q42nt for delete in 4h57m36s due to TTL"
time="2023-08-16T10:41:50.563Z" level=info msg="Queueing Succeeded workflow argo-managed-processor-staging/ca-keystone-feature-parity-testing-q42nt for delete in 4h50m42s due to TTL"
time="2023-08-16T10:46:52.196Z" level=info msg="Queueing Succeeded workflow argo-managed-processor-staging/ca-keystone-feature-parity-testing-q42nt for delete in 4h45m40s due to TTL"
time="2023-08-16T10:54:55.429Z" level=info msg="Queueing Succeeded workflow argo-managed-processor-staging/ca-keystone-feature-parity-testing-q42nt for delete in 4h37m37s due to TTL"
time="2023-08-16T10:55:52.254Z" level=info msg="Queueing Succeeded workflow argo-managed-processor-staging/ca-keystone-feature-parity-testing-q42nt for delete in 4h36m40s due to TTL"
time="2023-08-16T11:02:24.189Z" level=info msg="Queueing Succeeded workflow argo-managed-processor-staging/ca-keystone-feature-parity-testing-q42nt for delete in 4h30m8s due to TTL"

Logs from in your workflow's wait container

-
@popsu popsu added the type/bug label Aug 16, 2023
@terrytangyuan
Copy link
Member

Can you check your K8s apiserver logs?

@popsu
Copy link
Author

popsu commented Aug 16, 2023

Can you check your K8s apiserver logs?

Yeah they are available to me, but not sure what to check exactly? Just to verify nobody asked to delete the pod or (even tho Argo controller should have)?

@popsu
Copy link
Author

popsu commented Aug 16, 2023

I checked the k8s apiserver audit logs for one of the pods (used in a workflow) that was not deleted by workflow-controller GC (but should have): The last row is manual deletion that was done the next day. And the argo workflow-controller should have deleted it way earlier.

Screenshot from 2023-08-16 18-03-23

@agilgur5 agilgur5 added the type/regression Regression from previous behavior (a specific type of bug) label Aug 16, 2023
@terrytangyuan
Copy link
Member

Can you turn on debug level log for workflow controller?

@popsu
Copy link
Author

popsu commented Aug 17, 2023

Can you turn on debug level log for workflow controller?

Ah right, good idea 👍. I'll turn debug level for logs and see if I can find anything suspicious in there.

@popsu
Copy link
Author

popsu commented Aug 17, 2023

$ k logs workflow-controller-ddfdc8d8b-zmxd7 | grep -i "Delete pods 404" -C3 | grep -i "cleaning up pod"
time="2023-08-17T10:32:42.994Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-staging/ca-keystone-feature-parity-testing-jlj95-1340600742-agent/deletePod
time="2023-08-17T10:32:47.032Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-staging/ca-keystone-feature-parity-testing-g98sx-1340600742-agent/deletePod
time="2023-08-17T10:32:58.886Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-staging/ca-keystone-feature-parity-testing-575zv-1340600742-agent/deletePod
time="2023-08-17T10:33:10.937Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-staging/ca-keystone-feature-parity-testing-txwjh-1340600742-agent/deletePod
time="2023-08-17T10:33:15.209Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-staging/ca-keystone-feature-parity-testing-dnqzb-1340600742-agent/deletePod
time="2023-08-17T10:33:25.018Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-staging/ca-keystone-feature-parity-testing-9k5h9-1340600742-agent/deletePod
time="2023-08-17T10:33:38.055Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-staging/ca-keystone-feature-parity-testing-j8zvl-1340600742-agent/deletePod
time="2023-08-17T10:35:12.182Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-staging/ca-keystone-feature-parity-testing-b4zx9-1340600742-agent/deletePod
time="2023-08-17T10:47:37.024Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-staging/processing-sar-auto-qa-stage-gforms-czxgk-1340600742-agent/deletePod
time="2023-08-17T10:49:56.984Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-staging/ca-keystone-feature-parity-testing-9j92x-1340600742-agent/deletePod
time="2023-08-17T10:49:57.969Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-staging/ca-keystone-feature-parity-testing-8gjw4-1340600742-agent/deletePod
time="2023-08-17T10:51:22.181Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-staging/ca-keystone-feature-parity-testing-5rm2x-1340600742-agent/deletePod
time="2023-08-17T10:51:24.192Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-staging/ca-keystone-feature-parity-testing-hwczx-1340600742-agent/deletePod
time="2023-08-17T10:51:27.344Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-staging/ca-keystone-feature-parity-testing-4wkcc-1340600742-agent/deletePod
time="2023-08-17T10:51:31.223Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-staging/ca-keystone-feature-parity-testing-wzb9f-1340600742-agent/deletePod
time="2023-08-17T10:51:32.294Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-staging/ca-keystone-feature-parity-testing-nvn4f-1340600742-agent/deletePod
time="2023-08-17T10:51:32.462Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-staging/ca-keystone-feature-parity-testing-5gtt4-1340600742-agent/deletePod
time="2023-08-17T10:51:36.453Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-staging/ca-keystone-feature-parity-testing-xw8x6-1340600742-agent/deletePod
time="2023-08-17T11:13:28.895Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-staging/ca-keystone-feature-parity-testing-s66l5-1340600742-agent/deletePod

What is this 1340600742-agent thingy? I grepped the logs first for these Delete pods 404 since I noticed some pod deletions got 404. Then I found out all these pods, and after checking DataDog, it doesn't show any of these existing.

But seems that this is some Argo internal naming? When I googled this thing there are few argo-workflows issues with that name and also the repo has it in some test: https://github.com/argoproj/argo-workflows/blob/v3.4.8/workflow/controller/operator_agent_test.go#L74

edit: This might not be an issue, at least there are few similar lines I see in our production environment (which is running 3.3.8 currently that doesn't have the issue described in this bug report)

@terrytangyuan
Copy link
Member

Those are agent pods responsible for executing HTTP templates.

@juliev0
Copy link
Contributor

juliev0 commented Aug 24, 2023

Regarding the "Queueing Succeeded workflow" log line: does that have anything to do with this? I would think that might pertain if you were using PodGC: OnWorkflowCompletion, right? But OnPodCompletion shouldn't have anything to do with the Workflow having completed, should it? (although, I haven't looked at the code so maybe)

@juliev0 juliev0 added P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important P1 High priority. All bugs with >=5 thumbs up that aren’t P0, plus: Any other bugs deemed high priority and removed P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important labels Aug 24, 2023
@popsu
Copy link
Author

popsu commented Sep 1, 2023

I was looking into this more. Seems that we also now had 409's for workflow API calls when checking the kubernetes audit logs when we updated to 3.4.9. When running 3.3.8 we had basically none of these, but I'm not sure how if this is related or how:

Screenshot from 2023-09-01 17-10-30

@popsu
Copy link
Author

popsu commented Sep 1, 2023

These are changes to one of the workflows which didn't properly clean up all it's pods

[Edit: I think these patches are not relevant to the issue, since it seems there are 409s later also when only the controller itself is doing updates.]

We are using some patches in our workflow and I think those are for the rows 5-8 in this picture? There is a 409 immediately after it, but there are also some 409s later (rows 16, 18, 20) when only the argo-controller is making updates to the workflow. We are running 2 controller pods, but the leader election should make sure only one of them is making changes, so I'm not sure why it's getting these 409's (or where to find more information):

    - name: label-man
      inputs:
        parameters:
          - name: label-name
          - name: label-value
      resource:
        action: patch
        mergeStrategy: json
        flags:
          - workflow
          - "{{workflow.name}}"
        manifest: |
          - op: add
            path: "/metadata/labels/{{inputs.parameters.label-name}}"
            value: "{{inputs.parameters.label-value}}"

Screenshot from 2023-09-01 17-15-09

@popsu
Copy link
Author

popsu commented Sep 1, 2023

Looking at the workflow-controller logs related this workflow, it seems that the OnExit Handler was run multiple times. We send slack-notification with it, and I can see duplicate slack notification. And only one of the exit handler pods gets garbage collected, the other stays there.

time="2023-08-15T09:40:52.152Z" level=info msg="Processing workflow" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:40:53.686Z" level=info msg="Updated phase -> Running" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:40:53.686Z" level=info msg="Creating pvc sqs-sar-processor-production-8pwns-workvol" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:40:53.695Z" level=info msg="Creating pvc sqs-sar-processor-production-8pwns-workvol-qa" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:40:53.711Z" level=info msg="DAG node sqs-sar-processor-production-8pwns initialized Running" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:40:53.711Z" level=info msg="All of node sqs-sar-processor-production-8pwns.save-dcr-metadata dependencies [] completed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:40:53.712Z" level=info msg="Pod node sqs-sar-processor-production-8pwns-4116272344 initialized Pending" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:40:53.745Z" level=info msg="Created pod: sqs-sar-processor-production-8pwns.save-dcr-metadata (sqs-sar-processor-production-8pwns-read-s3-metadata-4116272344)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:40:53.746Z" level=info msg="TaskSet Reconciliation" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:40:53.746Z" level=info msg=reconcileAgentPod namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:40:53.789Z" level=info msg="Workflow update successful" namespace=argo-managed-processor-production phase=Running resourceVersion=1228756717 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:03.752Z" level=info msg="Processing workflow" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:03.752Z" level=info msg="Task-result reconciliation" namespace=argo-managed-processor-production numObjs=0 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:03.752Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:03.754Z" level=info msg="node changed" namespace=argo-managed-processor-production new.message= new.phase=Succeeded new.progress=0/1 nodeID=sqs-sar-processor-production-8pwns-4116272344 old.message= old.phase=Pending old.progress=0/1 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:03.767Z" level=info msg="All of node sqs-sar-processor-production-8pwns.generate-parameters dependencies [save-dcr-metadata] completed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:03.768Z" level=info msg="Pod node sqs-sar-processor-production-8pwns-1406809507 initialized Pending" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:03.810Z" level=info msg="Created pod: sqs-sar-processor-production-8pwns.generate-parameters (sqs-sar-processor-production-8pwns-generate-parameters-1406809507)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:03.818Z" level=info msg="TaskSet Reconciliation" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:03.818Z" level=info msg=reconcileAgentPod namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:03.869Z" level=info msg="Workflow update successful" namespace=argo-managed-processor-production phase=Running resourceVersion=1228757002 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:08.875Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-production/sqs-sar-processor-production-8pwns-read-s3-metadata-4116272344/deletePod
time="2023-08-15T09:41:13.822Z" level=info msg="Processing workflow" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.823Z" level=info msg="Task-result reconciliation" namespace=argo-managed-processor-production numObjs=0 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.823Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.823Z" level=info msg="node changed" namespace=argo-managed-processor-production new.message= new.phase=Succeeded new.progress=0/1 nodeID=sqs-sar-processor-production-8pwns-1406809507 old.message= old.phase=Pending old.progress=0/1 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.836Z" level=info msg="All of node sqs-sar-processor-production-8pwns.download-dcr dependencies [generate-parameters] completed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.837Z" level=info msg="Pod node sqs-sar-processor-production-8pwns-2695146199 initialized Pending" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.885Z" level=info msg="Created pod: sqs-sar-processor-production-8pwns.download-dcr (sqs-sar-processor-production-8pwns-download-dcr-cli-2695146199)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.885Z" level=info msg="All of node sqs-sar-processor-production-8pwns.do-dcr-config-download dependencies [generate-parameters] completed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.885Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-3154072718 initialized Skipped (message: when ''false' == 'false' && 'false' == 'true'' evaluated false)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.885Z" level=info msg="All of node sqs-sar-processor-production-8pwns.save-command-api-config dependencies [generate-parameters] completed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.886Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-2055644340 initialized Skipped (message: when ''false' == 'true'' evaluated false)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.886Z" level=info msg="All of node sqs-sar-processor-production-8pwns.add-id-label dependencies [generate-parameters] completed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.886Z" level=info msg="Pod node sqs-sar-processor-production-8pwns-3607818475 initialized Pending" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.918Z" level=info msg="Created pod: sqs-sar-processor-production-8pwns.add-id-label (sqs-sar-processor-production-8pwns-label-man-3607818475)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.918Z" level=info msg="All of node sqs-sar-processor-production-8pwns.add-run-label dependencies [generate-parameters] completed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.918Z" level=info msg="Pod node sqs-sar-processor-production-8pwns-1096920377 initialized Pending" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.948Z" level=info msg="Created pod: sqs-sar-processor-production-8pwns.add-run-label (sqs-sar-processor-production-8pwns-label-man-1096920377)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.950Z" level=info msg="All of node sqs-sar-processor-production-8pwns.add-trace-label dependencies [generate-parameters] completed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.951Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-2636514637 initialized Skipped (message: when ''false' == 'true'' evaluated false)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.951Z" level=info msg="All of node sqs-sar-processor-production-8pwns.add-tracing-enabled-label dependencies [generate-parameters] completed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.952Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-1833346002 initialized Skipped (message: when ''false' == 'true'' evaluated false)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.952Z" level=info msg="TaskSet Reconciliation" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:13.952Z" level=info msg=reconcileAgentPod namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:14.000Z" level=info msg="Workflow update successful" namespace=argo-managed-processor-production phase=Running resourceVersion=1228757277 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:19.007Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-production/sqs-sar-processor-production-8pwns-generate-parameters-1406809507/deletePod
time="2023-08-15T09:41:23.889Z" level=info msg="Processing workflow" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:23.889Z" level=info msg="Task-result reconciliation" namespace=argo-managed-processor-production numObjs=0 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:23.889Z" level=info msg="node changed" namespace=argo-managed-processor-production new.message=PodInitializing new.phase=Pending new.progress=0/1 nodeID=sqs-sar-processor-production-8pwns-2695146199 old.message= old.phase=Pending old.progress=0/1 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:23.890Z" level=info msg="node changed" namespace=argo-managed-processor-production new.message= new.phase=Running new.progress=0/1 nodeID=sqs-sar-processor-production-8pwns-3607818475 old.message= old.phase=Pending old.progress=0/1 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:23.890Z" level=info msg="node changed" namespace=argo-managed-processor-production new.message= new.phase=Running new.progress=0/1 nodeID=sqs-sar-processor-production-8pwns-1096920377 old.message= old.phase=Pending old.progress=0/1 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:23.907Z" level=info msg="TaskSet Reconciliation" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:23.907Z" level=info msg=reconcileAgentPod namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:23.949Z" level=info msg="Workflow update successful" namespace=argo-managed-processor-production phase=Running resourceVersion=1228757555 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:39.709Z" level=info msg="Processing workflow" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:39.710Z" level=info msg="Task-result reconciliation" namespace=argo-managed-processor-production numObjs=0 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:39.710Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:39.710Z" level=info msg="node changed" namespace=argo-managed-processor-production new.message= new.phase=Succeeded new.progress=0/1 nodeID=sqs-sar-processor-production-8pwns-2695146199 old.message=PodInitializing old.phase=Pending old.progress=0/1 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:39.710Z" level=info msg="node unchanged" namespace=argo-managed-processor-production nodeID=sqs-sar-processor-production-8pwns-3607818475 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:39.710Z" level=info msg="node unchanged" namespace=argo-managed-processor-production nodeID=sqs-sar-processor-production-8pwns-1096920377 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:39.722Z" level=info msg="All of node sqs-sar-processor-production-8pwns.pulsepower dependencies [download-dcr] completed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:39.723Z" level=info msg="Pod node sqs-sar-processor-production-8pwns-2066690381 initialized Pending" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:39.776Z" level=info msg="Created pod: sqs-sar-processor-production-8pwns.pulsepower (sqs-sar-processor-production-8pwns-pulsepower-2066690381)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:39.788Z" level=info msg="All of node sqs-sar-processor-production-8pwns.preprocess-qa dependencies [download-dcr] completed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:39.788Z" level=info msg="Pod node sqs-sar-processor-production-8pwns-2513189394 initialized Pending" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:39.825Z" level=info msg="Created pod: sqs-sar-processor-production-8pwns.preprocess-qa (sqs-sar-processor-production-8pwns-preprocess-qa-2513189394)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:39.826Z" level=info msg="TaskSet Reconciliation" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:39.826Z" level=info msg=reconcileAgentPod namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:39.860Z" level=warning msg="Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"sqs-sar-processor-production-8pwns\": the object has been modified; please apply your changes to the latest version and try again Conflict" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:39.860Z" level=info msg="Re-applying updates on latest version and retrying update" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:39.955Z" level=info msg="Update retry attempt 1 successful" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:39.955Z" level=info msg="Workflow update successful" namespace=argo-managed-processor-production phase=Running resourceVersion=1228757963 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:44.961Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-production/sqs-sar-processor-production-8pwns-download-dcr-cli-2695146199/deletePod
time="2023-08-15T09:41:49.778Z" level=info msg="Processing workflow" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:49.779Z" level=info msg="Task-result reconciliation" namespace=argo-managed-processor-production numObjs=0 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:49.779Z" level=info msg="node changed" namespace=argo-managed-processor-production new.message=PodInitializing new.phase=Pending new.progress=0/1 nodeID=sqs-sar-processor-production-8pwns-2066690381 old.message= old.phase=Pending old.progress=0/1 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:49.779Z" level=info msg="node changed" namespace=argo-managed-processor-production new.message= new.phase=Succeeded new.progress=0/1 nodeID=sqs-sar-processor-production-8pwns-1096920377 old.message= old.phase=Running old.progress=0/1 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:49.779Z" level=info msg="node changed" namespace=argo-managed-processor-production new.message=PodInitializing new.phase=Pending new.progress=0/1 nodeID=sqs-sar-processor-production-8pwns-2513189394 old.message= old.phase=Pending old.progress=0/1 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:49.779Z" level=info msg="node changed" namespace=argo-managed-processor-production new.message= new.phase=Succeeded new.progress=0/1 nodeID=sqs-sar-processor-production-8pwns-3607818475 old.message= old.phase=Running old.progress=0/1 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:49.792Z" level=info msg="TaskSet Reconciliation" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:49.792Z" level=info msg=reconcileAgentPod namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:49.835Z" level=info msg="Workflow update successful" namespace=argo-managed-processor-production phase=Running resourceVersion=1228758221 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:41:54.839Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-production/sqs-sar-processor-production-8pwns-label-man-3607818475/deletePod
time="2023-08-15T09:41:54.839Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-production/sqs-sar-processor-production-8pwns-label-man-1096920377/deletePod
time="2023-08-15T09:42:00.100Z" level=info msg="Processing workflow" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.100Z" level=info msg="Task-result reconciliation" namespace=argo-managed-processor-production numObjs=0 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.100Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.101Z" level=info msg="node changed" namespace=argo-managed-processor-production new.message= new.phase=Succeeded new.progress=0/1 nodeID=sqs-sar-processor-production-8pwns-2513189394 old.message=PodInitializing old.phase=Pending old.progress=0/1 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.101Z" level=info msg="Pod failed: Error (exit code 1)" displayName=pulsepower namespace=argo-managed-processor-production pod=sqs-sar-processor-production-8pwns-pulsepower-2066690381 templateName=pulsepower workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.101Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.101Z" level=info msg="node changed" namespace=argo-managed-processor-production new.message="Error (exit code 1)" new.phase=Failed new.progress=0/1 nodeID=sqs-sar-processor-production-8pwns-2066690381 old.message=PodInitializing old.phase=Pending old.progress=0/1 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.114Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-3649619691 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.115Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-3051136182 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.115Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-1555690789 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.115Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-1590775209 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.116Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-1977079561 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.117Z" level=info msg="All of node sqs-sar-processor-production-8pwns.slack-notify-abort-preprocess-check-failure dependencies [pulsepower] completed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.118Z" level=info msg="Retry node sqs-sar-processor-production-8pwns-515552909 initialized Running" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.118Z" level=info msg="Pod node sqs-sar-processor-production-8pwns-2469401052 initialized Pending" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.171Z" level=info msg="Created pod: sqs-sar-processor-production-8pwns.slack-notify-abort-preprocess-check-failure(0) (sqs-sar-processor-production-8pwns-slack-send-2469401052)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.172Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-3480997215 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.172Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-2003323528 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.172Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-690644360 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.172Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-1277658783 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.172Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-3569950184 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.172Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-1038190901 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.173Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-2793774353 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.173Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-566304367 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.173Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-2813658222 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.173Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-975619886 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.173Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-1335356588 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.173Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-31247671 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.173Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-1293817112 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.174Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-395271223 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.174Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-1309994339 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.174Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-724898383 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.174Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-1567701854 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.174Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-1961045326 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.174Z" level=info msg="Skipped node sqs-sar-processor-production-8pwns-2334882381 initialized Omitted (message: omitted: depends condition not met)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.174Z" level=info msg="TaskSet Reconciliation" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.174Z" level=info msg=reconcileAgentPod namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:00.227Z" level=info msg="Workflow update successful" namespace=argo-managed-processor-production phase=Running resourceVersion=1228758483 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:05.235Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-production/sqs-sar-processor-production-8pwns-pulsepower-2066690381/deletePod
time="2023-08-15T09:42:05.236Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-production/sqs-sar-processor-production-8pwns-preprocess-qa-2513189394/deletePod
time="2023-08-15T09:42:10.185Z" level=info msg="Processing workflow" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:10.185Z" level=info msg="Task-result reconciliation" namespace=argo-managed-processor-production numObjs=0 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:10.186Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:10.186Z" level=info msg="node changed" namespace=argo-managed-processor-production new.message= new.phase=Succeeded new.progress=0/1 nodeID=sqs-sar-processor-production-8pwns-2469401052 old.message= old.phase=Pending old.progress=0/1 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:10.201Z" level=info msg="node sqs-sar-processor-production-8pwns-515552909 phase Running -> Succeeded" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:10.201Z" level=info msg="node sqs-sar-processor-production-8pwns-515552909 finished: 2023-08-15 09:42:10.201540523 +0000 UTC" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:10.210Z" level=info msg="Outbound nodes of sqs-sar-processor-production-8pwns set to [sqs-sar-processor-production-8pwns-1555690789 sqs-sar-processor-production-8pwns-3607818475 sqs-sar-processor-production-8pwns-1590775209 sqs-sar-processor-production-8pwns-1096920377 sqs-sar-processor-production-8pwns-1977079561 sqs-sar-processor-production-8pwns-2636514637 sqs-sar-processor-production-8pwns-1833346002 sqs-sar-processor-production-8pwns-2513189394 sqs-sar-processor-production-8pwns-2469401052 sqs-sar-processor-production-8pwns-2003323528 sqs-sar-processor-production-8pwns-1038190901 sqs-sar-processor-production-8pwns-2793774353 sqs-sar-processor-production-8pwns-566304367 sqs-sar-processor-production-8pwns-2813658222 sqs-sar-processor-production-8pwns-975619886 sqs-sar-processor-production-8pwns-1335356588 sqs-sar-processor-production-8pwns-395271223 sqs-sar-processor-production-8pwns-724898383 sqs-sar-processor-production-8pwns-1567701854 sqs-sar-processor-production-8pwns-2334882381]" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:10.211Z" level=info msg="node sqs-sar-processor-production-8pwns phase Running -> Failed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:10.211Z" level=info msg="node sqs-sar-processor-production-8pwns finished: 2023-08-15 09:42:10.211025708 +0000 UTC" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:10.211Z" level=info msg="Checking daemoned children of sqs-sar-processor-production-8pwns" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:10.211Z" level=info msg="TaskSet Reconciliation" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:10.211Z" level=info msg=reconcileAgentPod namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:10.211Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:10.211Z" level=info msg="Steps node sqs-sar-processor-production-8pwns-411787230 initialized Running" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:10.211Z" level=info msg="StepGroup node sqs-sar-processor-production-8pwns-793041056 initialized Running" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:10.212Z" level=info msg="Retry node sqs-sar-processor-production-8pwns-3379985614 initialized Running" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:10.212Z" level=info msg="Pod node sqs-sar-processor-production-8pwns-1389842557 initialized Pending" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:10.252Z" level=info msg="Created pod: sqs-sar-processor-production-8pwns.onExit[0].exit-notification(0) (sqs-sar-processor-production-8pwns-slack-send-1389842557)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:10.252Z" level=info msg="Workflow step group node sqs-sar-processor-production-8pwns-793041056 not yet completed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:10.308Z" level=info msg="Workflow update successful" namespace=argo-managed-processor-production phase=Running resourceVersion=1228758769 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:15.316Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-production/sqs-sar-processor-production-8pwns-slack-send-2469401052/deletePod
time="2023-08-15T09:42:20.264Z" level=info msg="Processing workflow" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.265Z" level=info msg="Task-result reconciliation" namespace=argo-managed-processor-production numObjs=0 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.265Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.265Z" level=info msg="node changed" namespace=argo-managed-processor-production new.message= new.phase=Succeeded new.progress=0/1 nodeID=sqs-sar-processor-production-8pwns-1389842557 old.message= old.phase=Pending old.progress=0/1 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.276Z" level=info msg="TaskSet Reconciliation" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.276Z" level=info msg=reconcileAgentPod namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.276Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.277Z" level=info msg="node sqs-sar-processor-production-8pwns-3379985614 phase Running -> Succeeded" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.277Z" level=info msg="node sqs-sar-processor-production-8pwns-3379985614 finished: 2023-08-15 09:42:20.277468408 +0000 UTC" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.277Z" level=info msg="Step group node sqs-sar-processor-production-8pwns-793041056 successful" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.277Z" level=info msg="node sqs-sar-processor-production-8pwns-793041056 phase Running -> Succeeded" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.277Z" level=info msg="node sqs-sar-processor-production-8pwns-793041056 finished: 2023-08-15 09:42:20.277955807 +0000 UTC" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.278Z" level=info msg="Outbound nodes of sqs-sar-processor-production-8pwns-3379985614 is [sqs-sar-processor-production-8pwns-1389842557]" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.278Z" level=info msg="Outbound nodes of sqs-sar-processor-production-8pwns-411787230 is [sqs-sar-processor-production-8pwns-1389842557]" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.278Z" level=info msg="node sqs-sar-processor-production-8pwns-411787230 phase Running -> Succeeded" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.278Z" level=info msg="node sqs-sar-processor-production-8pwns-411787230 finished: 2023-08-15 09:42:20.278547127 +0000 UTC" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.278Z" level=info msg="Checking daemoned children of sqs-sar-processor-production-8pwns-411787230" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.278Z" level=info msg="Updated phase Running -> Failed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.278Z" level=info msg="Marking workflow completed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.279Z" level=info msg="Marking workflow as pending archiving" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.279Z" level=info msg="Deleting PVC sqs-sar-processor-production-8pwns-workvol" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.290Z" level=info msg="Deleting PVC sqs-sar-processor-production-8pwns-workvol-qa" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.296Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-production/sqs-sar-processor-production-8pwns-1340600742-agent/deletePod
time="2023-08-15T09:42:20.301Z" level=info msg="Removing PVC \"kubernetes.io/pvc-protection\" finalizer" claimName=sqs-sar-processor-production-8pwns-workvol namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.319Z" level=info msg="Removing PVC \"kubernetes.io/pvc-protection\" finalizer" claimName=sqs-sar-processor-production-8pwns-workvol-qa namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.335Z" level=info msg="Deleted 2/2 PVCs" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.335Z" level=info msg="Checking daemoned children of " namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:20.389Z" level=info msg="Workflow update successful" namespace=argo-managed-processor-production phase=Failed resourceVersion=1228759025 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:25.401Z" level=info msg="cleaning up pod" action=deletePod key=argo-managed-processor-production/sqs-sar-processor-production-8pwns-slack-send-1389842557/deletePod
time="2023-08-15T09:42:38.854Z" level=info msg="Processing workflow" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:38.856Z" level=info msg="Task-result reconciliation" namespace=argo-managed-processor-production numObjs=0 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:38.856Z" level=info msg="Workflow pod is missing" namespace=argo-managed-processor-production nodeName="sqs-sar-processor-production-8pwns.slack-notify-abort-preprocess-check-failure(0)" nodePhase=Pending recentlyStarted=false workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:38.928Z" level=info msg="Created pod: sqs-sar-processor-production-8pwns.slack-notify-abort-preprocess-check-failure(0) (sqs-sar-processor-production-8pwns-slack-send-2469401052)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:38.938Z" level=info msg="TaskSet Reconciliation" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:38.938Z" level=info msg=reconcileAgentPod namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:48.932Z" level=info msg="Processing workflow" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:48.932Z" level=info msg="Task-result reconciliation" namespace=argo-managed-processor-production numObjs=0 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:48.933Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:48.933Z" level=info msg="node changed" namespace=argo-managed-processor-production new.message= new.phase=Succeeded new.progress=0/1 nodeID=sqs-sar-processor-production-8pwns-2469401052 old.message= old.phase=Pending old.progress=0/1 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:48.947Z" level=info msg="node sqs-sar-processor-production-8pwns-515552909 phase Running -> Succeeded" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:48.947Z" level=info msg="node sqs-sar-processor-production-8pwns-515552909 finished: 2023-08-15 09:42:48.947974914 +0000 UTC" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:48.956Z" level=info msg="Outbound nodes of sqs-sar-processor-production-8pwns set to [sqs-sar-processor-production-8pwns-1555690789 sqs-sar-processor-production-8pwns-3607818475 sqs-sar-processor-production-8pwns-1590775209 sqs-sar-processor-production-8pwns-1096920377 sqs-sar-processor-production-8pwns-1977079561 sqs-sar-processor-production-8pwns-2636514637 sqs-sar-processor-production-8pwns-1833346002 sqs-sar-processor-production-8pwns-2513189394 sqs-sar-processor-production-8pwns-2469401052 sqs-sar-processor-production-8pwns-2003323528 sqs-sar-processor-production-8pwns-1038190901 sqs-sar-processor-production-8pwns-2793774353 sqs-sar-processor-production-8pwns-566304367 sqs-sar-processor-production-8pwns-2813658222 sqs-sar-processor-production-8pwns-975619886 sqs-sar-processor-production-8pwns-1335356588 sqs-sar-processor-production-8pwns-395271223 sqs-sar-processor-production-8pwns-724898383 sqs-sar-processor-production-8pwns-1567701854 sqs-sar-processor-production-8pwns-2334882381]" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:48.956Z" level=info msg="node sqs-sar-processor-production-8pwns phase Running -> Failed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:48.956Z" level=info msg="node sqs-sar-processor-production-8pwns finished: 2023-08-15 09:42:48.956507704 +0000 UTC" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:48.956Z" level=info msg="Checking daemoned children of sqs-sar-processor-production-8pwns" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:48.956Z" level=info msg="TaskSet Reconciliation" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:48.956Z" level=info msg=reconcileAgentPod namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:48.956Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:48.956Z" level=info msg="Steps node sqs-sar-processor-production-8pwns-411787230 initialized Running" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:48.956Z" level=info msg="StepGroup node sqs-sar-processor-production-8pwns-793041056 initialized Running" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:48.957Z" level=info msg="Retry node sqs-sar-processor-production-8pwns-3379985614 initialized Running" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:48.957Z" level=info msg="Pod node sqs-sar-processor-production-8pwns-1389842557 initialized Pending" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:48.997Z" level=info msg="Created pod: sqs-sar-processor-production-8pwns.onExit[0].exit-notification(0) (sqs-sar-processor-production-8pwns-slack-send-1389842557)" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:48.997Z" level=info msg="Workflow step group node sqs-sar-processor-production-8pwns-793041056 not yet completed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:49.075Z" level=warning msg="Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"sqs-sar-processor-production-8pwns\": the object has been modified; please apply your changes to the latest version and try again Conflict" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:49.075Z" level=info msg="Re-applying updates on latest version and retrying update" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:49.121Z" level=info msg="Failed to re-apply update" error="must never update completed workflows" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.002Z" level=info msg="Processing workflow" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.004Z" level=info msg="Task-result reconciliation" namespace=argo-managed-processor-production numObjs=0 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.005Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.005Z" level=info msg="node changed" namespace=argo-managed-processor-production new.message= new.phase=Succeeded new.progress=0/1 nodeID=sqs-sar-processor-production-8pwns-2469401052 old.message= old.phase=Pending old.progress=0/1 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.020Z" level=info msg="node sqs-sar-processor-production-8pwns-515552909 phase Running -> Succeeded" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.020Z" level=info msg="node sqs-sar-processor-production-8pwns-515552909 finished: 2023-08-15 09:42:59.020616834 +0000 UTC" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.029Z" level=info msg="Outbound nodes of sqs-sar-processor-production-8pwns set to [sqs-sar-processor-production-8pwns-1555690789 sqs-sar-processor-production-8pwns-3607818475 sqs-sar-processor-production-8pwns-1590775209 sqs-sar-processor-production-8pwns-1096920377 sqs-sar-processor-production-8pwns-1977079561 sqs-sar-processor-production-8pwns-2636514637 sqs-sar-processor-production-8pwns-1833346002 sqs-sar-processor-production-8pwns-2513189394 sqs-sar-processor-production-8pwns-2469401052 sqs-sar-processor-production-8pwns-2003323528 sqs-sar-processor-production-8pwns-1038190901 sqs-sar-processor-production-8pwns-2793774353 sqs-sar-processor-production-8pwns-566304367 sqs-sar-processor-production-8pwns-2813658222 sqs-sar-processor-production-8pwns-975619886 sqs-sar-processor-production-8pwns-1335356588 sqs-sar-processor-production-8pwns-395271223 sqs-sar-processor-production-8pwns-724898383 sqs-sar-processor-production-8pwns-1567701854 sqs-sar-processor-production-8pwns-2334882381]" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.030Z" level=info msg="node sqs-sar-processor-production-8pwns phase Running -> Failed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.030Z" level=info msg="node sqs-sar-processor-production-8pwns finished: 2023-08-15 09:42:59.030082039 +0000 UTC" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.030Z" level=info msg="Checking daemoned children of sqs-sar-processor-production-8pwns" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.030Z" level=info msg="TaskSet Reconciliation" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.030Z" level=info msg=reconcileAgentPod namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.030Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.030Z" level=info msg="Steps node sqs-sar-processor-production-8pwns-411787230 initialized Running" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.030Z" level=info msg="StepGroup node sqs-sar-processor-production-8pwns-793041056 initialized Running" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.031Z" level=info msg="Retry node sqs-sar-processor-production-8pwns-3379985614 initialized Running" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.031Z" level=info msg="Pod node sqs-sar-processor-production-8pwns-1389842557 initialized Pending" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.031Z" level=info msg="Workflow step group node sqs-sar-processor-production-8pwns-793041056 not yet completed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.060Z" level=warning msg="Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"sqs-sar-processor-production-8pwns\": the object has been modified; please apply your changes to the latest version and try again Conflict" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.060Z" level=info msg="Re-applying updates on latest version and retrying update" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:42:59.099Z" level=info msg="Failed to re-apply update" error="must never update completed workflows" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.065Z" level=info msg="Processing workflow" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.066Z" level=info msg="Task-result reconciliation" namespace=argo-managed-processor-production numObjs=0 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.066Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.066Z" level=info msg="node changed" namespace=argo-managed-processor-production new.message= new.phase=Succeeded new.progress=0/1 nodeID=sqs-sar-processor-production-8pwns-2469401052 old.message= old.phase=Pending old.progress=0/1 workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.081Z" level=info msg="node sqs-sar-processor-production-8pwns-515552909 phase Running -> Succeeded" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.081Z" level=info msg="node sqs-sar-processor-production-8pwns-515552909 finished: 2023-08-15 09:43:09.081803628 +0000 UTC" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.093Z" level=info msg="Outbound nodes of sqs-sar-processor-production-8pwns set to [sqs-sar-processor-production-8pwns-1555690789 sqs-sar-processor-production-8pwns-3607818475 sqs-sar-processor-production-8pwns-1590775209 sqs-sar-processor-production-8pwns-1096920377 sqs-sar-processor-production-8pwns-1977079561 sqs-sar-processor-production-8pwns-2636514637 sqs-sar-processor-production-8pwns-1833346002 sqs-sar-processor-production-8pwns-2513189394 sqs-sar-processor-production-8pwns-2469401052 sqs-sar-processor-production-8pwns-2003323528 sqs-sar-processor-production-8pwns-1038190901 sqs-sar-processor-production-8pwns-2793774353 sqs-sar-processor-production-8pwns-566304367 sqs-sar-processor-production-8pwns-2813658222 sqs-sar-processor-production-8pwns-975619886 sqs-sar-processor-production-8pwns-1335356588 sqs-sar-processor-production-8pwns-395271223 sqs-sar-processor-production-8pwns-724898383 sqs-sar-processor-production-8pwns-1567701854 sqs-sar-processor-production-8pwns-2334882381]" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.093Z" level=info msg="node sqs-sar-processor-production-8pwns phase Running -> Failed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.093Z" level=info msg="node sqs-sar-processor-production-8pwns finished: 2023-08-15 09:43:09.093162956 +0000 UTC" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.093Z" level=info msg="Checking daemoned children of sqs-sar-processor-production-8pwns" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.093Z" level=info msg="TaskSet Reconciliation" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.093Z" level=info msg=reconcileAgentPod namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.093Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.094Z" level=info msg="Steps node sqs-sar-processor-production-8pwns-411787230 initialized Running" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.094Z" level=info msg="StepGroup node sqs-sar-processor-production-8pwns-793041056 initialized Running" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.095Z" level=info msg="Retry node sqs-sar-processor-production-8pwns-3379985614 initialized Running" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.096Z" level=info msg="Pod node sqs-sar-processor-production-8pwns-1389842557 initialized Pending" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.096Z" level=info msg="Workflow step group node sqs-sar-processor-production-8pwns-793041056 not yet completed" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.121Z" level=warning msg="Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"sqs-sar-processor-production-8pwns\": the object has been modified; please apply your changes to the latest version and try again Conflict" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.122Z" level=info msg="Re-applying updates on latest version and retrying update" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:09.151Z" level=info msg="Failed to re-apply update" error="must never update completed workflows" namespace=argo-managed-processor-production workflow=sqs-sar-processor-production-8pwns
time="2023-08-15T09:43:50.594Z" level=info msg="archiving workflow" namespace=argo-managed-processor-production uid=994d8894-9f0b-4206-bd29-8d49926c7e21 workflow=sqs-sar-processor-production-8pwns

@agilgur5 agilgur5 added the area/controller Controller issues, panics label Sep 8, 2023
@jmaicher
Copy link

jmaicher commented Sep 8, 2023

We see a similar issue after upgrading from 3.4.4 to 3.4.9. In our case, workflows occasionally complete successfully but somehow enter error/failure state afterwards and the exit handler gets triggered again and sends a failure notification.

Here is our workflow (simplified):

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: example-workflow-
spec:
  entrypoint: main
  onExit: exit-handler
  podGC:
    strategy: OnWorkflowCompletion
  templates:
    - name: main
      dag: ...
    - name: exit-handler
      steps:
        - - name: notify-failed
            templateRef:
              name: slack
              template: notify-failed
            when: "{{workflow.status}} != Succeeded"

From the workflow controller logs below you can see that the workflow completes at 2023-09-07T06:08:00.021Z and is marked as pending for archiving. The exit handler was skipped because Succeeded != Succeeded evaluated false. At 2023-09-07T06:08:12.946Z, the task-result reconciliation runs, the workflow pod is missing and the main step fails. The exit handler is then triggered again at 2023-09-07T06:08:12.959Z and fires the failure notification. Since the workflow already completed, the workflow update fails around 2023-09-07T06:08:23.574Z with "must never update completed workflows". The exit handler is then triggered again but doesn't send another failure notification.

time="2023-09-07T06:07:21.086Z" level=info msg="Processing workflow" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:21.196Z" level=info msg="adding artifact GC finalizer" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:21.196Z" level=info msg="Updated phase  -> Running" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:21.196Z" level=info msg="DAG node example-workflow-hq9pn initialized Running" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:21.196Z" level=info msg="All of node example-workflow-hq9pn.main dependencies [] completed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:21.201Z" level=info msg="Steps node example-workflow-hq9pn-663554177 initialized Running" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:21.201Z" level=info msg="StepGroup node example-workflow-hq9pn-3224806949 initialized Running" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:21.204Z" level=info msg="Pod node example-workflow-hq9pn-437922017 initialized Pending" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:22.041Z" level=info msg="Created pod: example-workflow-hq9pn.main[0].main-worker (example-workflow-hq9pn-sisyphus-project-template-fan-out-437922017)" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:22.041Z" level=info msg="Workflow step group node example-workflow-hq9pn-3224806949 not yet completed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:22.041Z" level=info msg="TaskSet Reconciliation" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:22.041Z" level=info msg=reconcileAgentPod namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:22.145Z" level=info msg="Workflow update successful" namespace=workflows phase=Running resourceVersion=1330498802 workflow=example-workflow-hq9pn
time="2023-09-07T06:07:32.047Z" level=info msg="Processing workflow" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:32.050Z" level=info msg="Task-result reconciliation" namespace=workflows numObjs=0 workflow=example-workflow-hq9pn
time="2023-09-07T06:07:32.050Z" level=info msg="node changed" namespace=workflows new.message=PodInitializing new.phase=Pending new.progress=0/1 nodeID=example-workflow-hq9pn-437922017 old.message= old.phase=Pending old.progress=0/1 workflow=example-workflow-hq9pn
time="2023-09-07T06:07:32.060Z" level=info msg="Workflow step group node example-workflow-hq9pn-3224806949 not yet completed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:32.060Z" level=info msg="TaskSet Reconciliation" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:32.060Z" level=info msg=reconcileAgentPod namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:33.471Z" level=info msg="Workflow update successful" namespace=workflows phase=Running resourceVersion=1330499198 workflow=example-workflow-hq9pn
time="2023-09-07T06:07:42.173Z" level=info msg="Processing workflow" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:42.177Z" level=info msg="Task-result reconciliation" namespace=workflows numObjs=0 workflow=example-workflow-hq9pn
time="2023-09-07T06:07:42.177Z" level=info msg="node changed" namespace=workflows new.message= new.phase=Running new.progress=0/1 nodeID=example-workflow-hq9pn-437922017 old.message=PodInitializing old.phase=Pending old.progress=0/1 workflow=example-workflow-hq9pn
time="2023-09-07T06:07:42.191Z" level=info msg="Workflow step group node example-workflow-hq9pn-3224806949 not yet completed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:42.191Z" level=info msg="TaskSet Reconciliation" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:42.191Z" level=info msg=reconcileAgentPod namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:07:42.315Z" level=info msg="Workflow update successful" namespace=workflows phase=Running resourceVersion=1330499543 workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.004Z" level=info msg="Processing workflow" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.008Z" level=info msg="Task-result reconciliation" namespace=workflows numObjs=1 workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.008Z" level=info msg="task-result changed" namespace=workflows nodeID=example-workflow-hq9pn-437922017 workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.008Z" level=info msg="node changed" namespace=workflows new.message= new.phase=Succeeded new.progress=0/1 nodeID=example-workflow-hq9pn-437922017 old.message=PodInitializing old.phase=Pending old.progress=0/1 workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.017Z" level=info msg="Step group node example-workflow-hq9pn-3224806949 successful" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.017Z" level=info msg="node example-workflow-hq9pn-3224806949 phase Running -> Succeeded" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.017Z" level=info msg="node example-workflow-hq9pn-3224806949 finished: 2023-09-07 06:08:00.017528779 +0000 UTC" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.017Z" level=info msg="StepGroup node example-workflow-hq9pn-2084075952 initialized Running" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.020Z" level=info msg="SG Outbound nodes of example-workflow-hq9pn-437922017 are [example-workflow-hq9pn-437922017]" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.020Z" level=info msg="Skipping example-workflow-hq9pn.main[1].parallel-workers: Skipped, empty params" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.020Z" level=info msg="Skipped node example-workflow-hq9pn-2686192655 initialized Skipped (message: Skipped, empty params)" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.020Z" level=info msg="Step group node example-workflow-hq9pn-2084075952 successful" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.020Z" level=info msg="node example-workflow-hq9pn-2084075952 phase Running -> Succeeded" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.020Z" level=info msg="node example-workflow-hq9pn-2084075952 finished: 2023-09-07 06:08:00.020978629 +0000 UTC" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.020Z" level=info msg="Outbound nodes of example-workflow-hq9pn-2686192655 is [example-workflow-hq9pn-2686192655]" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="Outbound nodes of example-workflow-hq9pn-663554177 is [example-workflow-hq9pn-2686192655]" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="node example-workflow-hq9pn-663554177 phase Running -> Succeeded" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="node example-workflow-hq9pn-663554177 finished: 2023-09-07 06:08:00.02105617 +0000 UTC" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="Checking daemoned children of example-workflow-hq9pn-663554177" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="Outbound nodes of example-workflow-hq9pn set to [example-workflow-hq9pn-2686192655]" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="node example-workflow-hq9pn phase Running -> Succeeded" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="node example-workflow-hq9pn finished: 2023-09-07 06:08:00.021116821 +0000 UTC" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="Checking daemoned children of example-workflow-hq9pn" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="TaskSet Reconciliation" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg=reconcileAgentPod namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="Running OnExit handler: exit-handler" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="Steps node example-workflow-hq9pn-3648729681 initialized Running" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="StepGroup node example-workflow-hq9pn-1754934389 initialized Running" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="Skipping example-workflow-hq9pn.onExit[0].notify-failed: when 'Succeeded != Succeeded' evaluated false" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="Skipped node example-workflow-hq9pn-180906186 initialized Skipped (message: when 'Succeeded != Succeeded' evaluated false)" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="Step group node example-workflow-hq9pn-1754934389 successful" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="node example-workflow-hq9pn-1754934389 phase Running -> Succeeded" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="node example-workflow-hq9pn-1754934389 finished: 2023-09-07 06:08:00.021359114 +0000 UTC" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="Outbound nodes of example-workflow-hq9pn-180906186 is [example-workflow-hq9pn-180906186]" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="Outbound nodes of example-workflow-hq9pn-3648729681 is [example-workflow-hq9pn-180906186]" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="node example-workflow-hq9pn-3648729681 phase Running -> Succeeded" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="node example-workflow-hq9pn-3648729681 finished: 2023-09-07 06:08:00.021395154 +0000 UTC" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="Checking daemoned children of example-workflow-hq9pn-3648729681" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="Updated phase Running -> Succeeded" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="Marking workflow completed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="Marking workflow as pending archiving" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.021Z" level=info msg="Checking daemoned children of " namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.026Z" level=info msg="cleaning up pod" action=deletePod key=workflows/example-workflow-hq9pn-1340600742-agent/deletePod
time="2023-09-07T06:08:00.050Z" level=warning msg="Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"example-workflow-hq9pn\": the object has been modified; please apply your changes to the latest version and try again Conflict" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.050Z" level=info msg="Re-applying updates on latest version and retrying update" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.189Z" level=info msg="Update retry attempt 1 successful" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:00.189Z" level=info msg="Workflow update successful" namespace=workflows phase=Succeeded resourceVersion=1330499987 workflow=example-workflow-hq9pn
time="2023-09-07T06:08:05.272Z" level=info msg="cleaning up pod" action=deletePod key=workflows/example-workflow-hq9pn-sisyphus-project-template-fan-out-437922017/deletePod
time="2023-09-07T06:08:12.943Z" level=info msg="Processing workflow" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.946Z" level=info msg="Task-result reconciliation" namespace=workflows numObjs=0 workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.946Z" level=info msg="Workflow pod is missing" namespace=workflows nodeName="example-workflow-hq9pn.main[0].main-worker" nodePhase=Running recentlyStarted=false workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.946Z" level=info msg="node example-workflow-hq9pn-437922017 phase Running -> Error" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.946Z" level=info msg="node example-workflow-hq9pn-437922017 message: pod deleted" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.946Z" level=info msg="node example-workflow-hq9pn-437922017 finished: 2023-09-07 06:08:12.946896888 +0000 UTC" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.953Z" level=info msg="Step group node example-workflow-hq9pn-3224806949 deemed failed: child 'example-workflow-hq9pn-437922017' failed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.953Z" level=info msg="node example-workflow-hq9pn-3224806949 phase Running -> Failed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.953Z" level=info msg="node example-workflow-hq9pn-3224806949 message: child 'example-workflow-hq9pn-437922017' failed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.953Z" level=info msg="node example-workflow-hq9pn-3224806949 finished: 2023-09-07 06:08:12.953441194 +0000 UTC" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.953Z" level=info msg="step group example-workflow-hq9pn-3224806949 was unsuccessful: child 'example-workflow-hq9pn-437922017' failed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.956Z" level=info msg="Outbound nodes of example-workflow-hq9pn-437922017 is [example-workflow-hq9pn-437922017]" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.956Z" level=info msg="Outbound nodes of example-workflow-hq9pn-663554177 is [example-workflow-hq9pn-437922017]" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.956Z" level=info msg="node example-workflow-hq9pn-663554177 phase Running -> Failed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.956Z" level=info msg="node example-workflow-hq9pn-663554177 message: child 'example-workflow-hq9pn-437922017' failed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.956Z" level=info msg="node example-workflow-hq9pn-663554177 finished: 2023-09-07 06:08:12.956142645 +0000 UTC" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.956Z" level=info msg="Checking daemoned children of example-workflow-hq9pn-663554177" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.958Z" level=info msg="Outbound nodes of example-workflow-hq9pn set to [example-workflow-hq9pn-437922017]" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.958Z" level=info msg="node example-workflow-hq9pn phase Running -> Failed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.958Z" level=info msg="node example-workflow-hq9pn finished: 2023-09-07 06:08:12.958950947 +0000 UTC" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.958Z" level=info msg="Checking daemoned children of example-workflow-hq9pn" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.958Z" level=info msg="TaskSet Reconciliation" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.959Z" level=info msg=reconcileAgentPod namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.959Z" level=info msg="Running OnExit handler: exit-handler" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.959Z" level=info msg="Steps node example-workflow-hq9pn-3648729681 initialized Running" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.959Z" level=info msg="StepGroup node example-workflow-hq9pn-1754934389 initialized Running" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:12.962Z" level=info msg="Pod node example-workflow-hq9pn-180906186 initialized Pending" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:13.450Z" level=info msg="Created pod: example-workflow-hq9pn.onExit[0].notify-failed (example-workflow-hq9pn-notify-failed-180906186)" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:13.450Z" level=info msg="Workflow step group node example-workflow-hq9pn-1754934389 not yet completed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:13.659Z" level=warning msg="Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"example-workflow-hq9pn\": the object has been modified; please apply your changes to the latest version and try again Conflict" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:13.659Z" level=info msg="Re-applying updates on latest version and retrying update" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:13.723Z" level=info msg="Failed to re-apply update" error="must never update completed workflows" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.456Z" level=info msg="Processing workflow" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.459Z" level=info msg="Task-result reconciliation" namespace=workflows numObjs=0 workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.459Z" level=info msg="Workflow pod is missing" namespace=workflows nodeName="example-workflow-hq9pn.main[0].main-worker" nodePhase=Running recentlyStarted=false workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.459Z" level=info msg="node example-workflow-hq9pn-437922017 phase Running -> Error" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.459Z" level=info msg="node example-workflow-hq9pn-437922017 message: pod deleted" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.459Z" level=info msg="node example-workflow-hq9pn-437922017 finished: 2023-09-07 06:08:23.459449342 +0000 UTC" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.465Z" level=info msg="Step group node example-workflow-hq9pn-3224806949 deemed failed: child 'example-workflow-hq9pn-437922017' failed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.465Z" level=info msg="node example-workflow-hq9pn-3224806949 phase Running -> Failed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.465Z" level=info msg="node example-workflow-hq9pn-3224806949 message: child 'example-workflow-hq9pn-437922017' failed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.465Z" level=info msg="node example-workflow-hq9pn-3224806949 finished: 2023-09-07 06:08:23.465387731 +0000 UTC" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.465Z" level=info msg="step group example-workflow-hq9pn-3224806949 was unsuccessful: child 'example-workflow-hq9pn-437922017' failed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.467Z" level=info msg="Outbound nodes of example-workflow-hq9pn-437922017 is [example-workflow-hq9pn-437922017]" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.467Z" level=info msg="Outbound nodes of example-workflow-hq9pn-663554177 is [example-workflow-hq9pn-437922017]" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.467Z" level=info msg="node example-workflow-hq9pn-663554177 phase Running -> Failed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.467Z" level=info msg="node example-workflow-hq9pn-663554177 message: child 'example-workflow-hq9pn-437922017' failed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.467Z" level=info msg="node example-workflow-hq9pn-663554177 finished: 2023-09-07 06:08:23.467948371 +0000 UTC" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.467Z" level=info msg="Checking daemoned children of example-workflow-hq9pn-663554177" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.470Z" level=info msg="Outbound nodes of example-workflow-hq9pn set to [example-workflow-hq9pn-437922017]" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.470Z" level=info msg="node example-workflow-hq9pn phase Running -> Failed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.470Z" level=info msg="node example-workflow-hq9pn finished: 2023-09-07 06:08:23.47054533 +0000 UTC" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.470Z" level=info msg="Checking daemoned children of example-workflow-hq9pn" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.470Z" level=info msg="TaskSet Reconciliation" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.470Z" level=info msg=reconcileAgentPod namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.470Z" level=info msg="Running OnExit handler: exit-handler" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.470Z" level=info msg="Steps node example-workflow-hq9pn-3648729681 initialized Running" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.470Z" level=info msg="StepGroup node example-workflow-hq9pn-1754934389 initialized Running" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.473Z" level=info msg="Pod node example-workflow-hq9pn-180906186 initialized Pending" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.473Z" level=info msg="Workflow step group node example-workflow-hq9pn-1754934389 not yet completed" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.533Z" level=warning msg="Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"example-workflow-hq9pn\": the object has been modified; please apply your changes to the latest version and try again Conflict" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.533Z" level=info msg="Re-applying updates on latest version and retrying update" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.574Z" level=info msg="Failed to re-apply update" error="must never update completed workflows" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:23.577Z" level=info msg="archiving workflow" namespace=workflows uid=c7812cc5-421a-47d4-a9fa-73a609aaba1f workflow=example-workflow-hq9pn
time="2023-09-07T06:08:33.578Z" level=info msg="Processing workflow" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:33.583Z" level=info msg="Checking daemoned children of " namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:33.619Z" level=warning msg="Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"example-workflow-hq9pn\": the object has been modified; please apply your changes to the latest version and try again Conflict" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:33.619Z" level=info msg="Re-applying updates on latest version and retrying update" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:33.634Z" level=info msg="Failed to re-apply update" error="must never update completed workflows" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:43.765Z" level=info msg="Queueing Succeeded workflow workflows/example-workflow-hq9pn for delete in 59m17s due to TTL"
time="2023-09-07T06:08:53.765Z" level=info msg="Processing workflow" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:53.768Z" level=info msg="Checking daemoned children of " namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:08:53.925Z" level=info msg="Workflow update successful" namespace=workflows phase=Succeeded resourceVersion=1330502461 workflow=example-workflow-hq9pn
time="2023-09-07T06:08:59.012Z" level=info msg="cleaning up pod" action=deletePod key=workflows/example-workflow-hq9pn-notify-failed-180906186/deletePod
time="2023-09-07T06:09:09.105Z" level=info msg="Processing workflow" namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:09:09.109Z" level=info msg="Checking daemoned children of " namespace=workflows workflow=example-workflow-hq9pn
time="2023-09-07T06:09:10.918Z" level=info msg="Queueing Succeeded workflow workflows/example-workflow-hq9pn for delete in 58m50s due to TTL"
time="2023-09-07T06:09:47.020Z" level=info msg="Queueing Succeeded workflow workflows/example-workflow-hq9pn for delete in 58m13s due to TTL"
time="2023-09-07T06:09:55.646Z" level=info msg="Queueing Succeeded workflow workflows/example-workflow-hq9pn for delete in 58m5s due to TTL"
time="2023-09-07T06:10:14.460Z" level=info msg="Queueing Succeeded workflow workflows/example-workflow-hq9pn for delete in 57m46s due to TTL"
time="2023-09-07T06:10:31.717Z" level=info msg="Queueing Succeeded workflow workflows/example-workflow-hq9pn for delete in 57m29s due to TTL"
time="2023-09-07T06:11:01.560Z" level=info msg="Queueing Succeeded workflow workflows/example-workflow-hq9pn for delete in 56m59s due to TTL"
time="2023-09-07T06:11:12.029Z" level=info msg="Queueing Succeeded workflow workflows/example-workflow-hq9pn for delete in 56m48s due to TTL"
time="2023-09-07T06:11:13.642Z" level=info msg="Queueing Succeeded workflow workflows/example-workflow-hq9pn for delete in 56m47s due to TTL"
time="2023-09-07T06:11:36.341Z" level=info msg="Queueing Succeeded workflow workflows/example-workflow-hq9pn for delete in 56m24s due to TTL"
time="2023-09-07T06:11:41.091Z" level=info msg="Queueing Succeeded workflow workflows/example-workflow-hq9pn for delete in 56m19s due to TTL"
time="2023-09-07T06:12:10.538Z" level=info msg="Queueing Succeeded workflow workflows/example-workflow-hq9pn for delete in 55m50s due to TTL"
time="2023-09-07T06:12:32.579Z" level=info msg="Queueing Succeeded workflow workflows/example-workflow-hq9pn for delete in 55m28s due to TTL"
time="2023-09-07T06:12:58.111Z" level=info msg="Queueing Succeeded workflow workflows/example-workflow-hq9pn for delete in 55m2s due to TTL"
time="2023-09-07T06:13:06.745Z" level=info msg="Queueing Succeeded workflow workflows/example-workflow-hq9pn for delete in 54m54s due to TTL"
time="2023-09-07T06:13:34.199Z" level=info msg="Queueing Succeeded workflow workflows/example-workflow-hq9pn for delete in 54m26s due to TTL"
time="2023-09-07T06:13:52.419Z" level=info msg="Queueing Succeeded workflow workflows/example-workflow-hq9pn for delete in 54m8s due to TTL"

@popsu
Copy link
Author

popsu commented Sep 18, 2023

Thanks @jmaicher

We ended up migrating to v3.4.4 and it's been working without issues now (6 days so far), so seems that this bug has been introduced in a version between [3.4.5, 3.4.9]

@kangyanzhou
Copy link

I'm seeing a very similar issue in v3.4.10 as above: Here's an example debug level log trace from the workflow controller:

time="2023-09-18T20:35:59.885Z" level=debug msg="Executing node goc-ld-push-4kp96.checkout-sha of Pod is Running" namespace=phoenix workflow=goc-ld-push-4kp96
time="2023-09-18T20:35:59.885Z" level=info msg="TaskSet Reconciliation" namespace=phoenix workflow=goc-ld-push-4kp96
time="2023-09-18T20:35:59.885Z" level=info msg=reconcileAgentPod namespace=phoenix workflow=goc-ld-push-4kp96
time="2023-09-18T20:36:01.819Z" level=debug msg="Get leases 200"
time="2023-09-18T20:36:01.828Z" level=debug msg="Update leases 200"
time="2023-09-18T20:36:05.575Z" level=debug msg="Syncing all CronWorkflows"
time="2023-09-18T20:36:06.844Z" level=debug msg="Get leases 200"
time="2023-09-18T20:36:06.855Z" level=debug msg="Update leases 200"
time="2023-09-18T20:36:10.565Z" level=debug msg="Watch cronworkflows 200"
time="2023-09-18T20:36:11.872Z" level=debug msg="Get leases 200"
time="2023-09-18T20:36:11.928Z" level=debug msg="Update leases 200"
time="2023-09-18T20:36:15.559Z" level=debug msg="Check the workflow existence"
time="2023-09-18T20:36:15.575Z" level=debug msg="Syncing all CronWorkflows"
time="2023-09-18T20:36:16.939Z" level=debug msg="Get leases 200"
time="2023-09-18T20:36:16.971Z" level=debug msg="Update leases 200"
time="2023-09-18T20:36:21.978Z" level=debug msg="Get leases 200"
time="2023-09-18T20:36:21.987Z" level=debug msg="Update leases 200"
time="2023-09-18T20:36:23.477Z" level=debug msg="Watch workflowtemplates 200"
time="2023-09-18T20:36:25.576Z" level=debug msg="Syncing all CronWorkflows"
time="2023-09-18T20:36:27.003Z" level=debug msg="Get leases 200"
time="2023-09-18T20:36:27.013Z" level=debug msg="Update leases 200"
time="2023-09-18T20:36:32.022Z" level=debug msg="Get leases 200"
time="2023-09-18T20:36:32.030Z" level=debug msg="Update leases 200"
time="2023-09-18T20:36:35.570Z" level=debug msg="Watch workflows 200"
time="2023-09-18T20:36:35.576Z" level=debug msg="Syncing all CronWorkflows"
time="2023-09-18T20:36:37.039Z" level=debug msg="Get leases 200"
time="2023-09-18T20:36:37.048Z" level=debug msg="Update leases 200"
time="2023-09-18T20:36:38.193Z" level=debug msg="List workflows 200"
time="2023-09-18T20:36:42.060Z" level=debug msg="Get leases 200"
time="2023-09-18T20:36:42.066Z" level=debug msg="Update leases 200"
time="2023-09-18T20:36:45.577Z" level=debug msg="Syncing all CronWorkflows"
time="2023-09-18T20:36:47.080Z" level=debug msg="Get leases 200"
time="2023-09-18T20:36:47.096Z" level=debug msg="Update leases 200"
time="2023-09-18T20:36:52.102Z" level=debug msg="Get leases 200"
time="2023-09-18T20:36:52.112Z" level=debug msg="Update leases 200"
time="2023-09-18T20:36:55.577Z" level=debug msg="Syncing all CronWorkflows"
time="2023-09-18T20:36:57.140Z" level=debug msg="Get leases 200"
time="2023-09-18T20:36:57.150Z" level=debug msg="Update leases 200"
time="2023-09-18T20:37:02.157Z" level=debug msg="Get leases 200"
time="2023-09-18T20:37:02.169Z" level=debug msg="Update leases 200"
time="2023-09-18T20:37:04.760Z" level=info msg="Processing workflow" namespace=phoenix workflow=goc-ld-push-4kp96
time="2023-09-18T20:37:04.760Z" level=debug msg="unresolved is allowed " error=unresolved
time="2023-09-18T20:37:04.760Z" level=debug msg="unresolved is allowed " error=unresolved
time="2023-09-18T20:37:04.760Z" level=debug msg="unresolved is allowed " error=unresolved
time="2023-09-18T20:37:04.760Z" level=info msg="Task-result reconciliation" namespace=phoenix numObjs=0 workflow=goc-ld-push-4kp96
time="2023-09-18T20:37:04.760Z" level=info msg="Workflow pod is missing" namespace=phoenix nodeName=goc-ld-push-4kp96.checkout-sha nodePhase=Running recentlyStarted=false workflow=goc-ld-push-4kp96
time="2023-09-18T20:37:04.760Z" level=info msg="node goc-ld-push-4kp96-857076312 phase Running -> Error" namespace=phoenix workflow=goc-ld-push-4kp96
time="2023-09-18T20:37:04.760Z" level=info msg="node goc-ld-push-4kp96-857076312 message: pod deleted" namespace=phoenix workflow=goc-ld-push-4kp96
time="2023-09-18T20:37:04.761Z" level=info msg="node goc-ld-push-4kp96-857076312 finished: 2023-09-18 20:37:04.761002282 +0000 UTC" namespace=phoenix workflow=goc-ld-push-4kp96

From what I can see in the pod log, the pod actually succeeded without any error, but the workflow controller thought the pod was missing and thus updated the workflow status to failure.

@kangyanzhou
Copy link

I can see the same issue happens in v3.4.7, v3.4.5, v3.4.4 as well. I'm using EKS with k8s version 1.24.

@kangyanzhou
Copy link

kangyanzhou commented Sep 19, 2023

Actually my error might be irrelevant because our cluster has an operator that actively cleans up pods in the Succeeded phase before workflow controller can find the pods and update the status.

@agilgur5 agilgur5 added the area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more label Feb 18, 2024
@agilgur5 agilgur5 added this to the v3.4.x patches milestone Jun 5, 2024
@agilgur5
Copy link
Contributor

agilgur5 commented Jun 5, 2024

Has anyone confirmed that this is still an issue in later 3.4.x versions, e.g. 3.4.16, 3.4.17?

@agilgur5 agilgur5 changed the title podGC: onPodCompletion not working properly in v3.4.9 v3.4.9 podGC: onPodCompletion not working properly Jun 5, 2024
@kangyanzhou
Copy link

Thanks for checking! It turned out my issue was irrelevant to the workflow controller and after some changes on our end it worked fine.

@tooptoop4
Copy link
Contributor

@popsu are u getting this on recent version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more P1 High priority. All bugs with >=5 thumbs up that aren’t P0, plus: Any other bugs deemed high priority type/bug type/regression Regression from previous behavior (a specific type of bug)
Projects
None yet
Development

No branches or pull requests

7 participants