Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow not Failed despite having Error node #10994

Open
3 tasks done
lxlxok opened this issue Apr 27, 2023 · 7 comments
Open
3 tasks done

Workflow not Failed despite having Error node #10994

lxlxok opened this issue Apr 27, 2023 · 7 comments
Labels
area/controller Controller issues, panics area/templates/dag P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important type/bug

Comments

@lxlxok
Copy link

lxlxok commented Apr 27, 2023

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

Screenshot 2023-04-27 at 12 40 16 PM
What happened:
Workflow has status.phase Succeeded, when when there is an Error node.

apiVersion: argoproj.io/v1alpha1
kind: Workflow                  # new type of k8s spec
metadata:
  generateName: hello-world-    # name of the workflow spec
spec:
  entrypoint: test          # invoke the whalesay template
  templates:
    - name: instant-dummy
      suspend:
        duration: "0.1s"
    - name: test              # name of the template
      dag:
        tasks:
        - name: A
          template: evaluation
          arguments:
            parameters:
              - name: INPUT
                value: "abc"
        - name: B
          depends: A
          template: instant-dummy
    - name: evaluation
      steps:
      - - name: placeholder
          template: instant-dummy
          when: "false"
      inputs:
        parameters:
          - name: INPUT
      outputs:
        parameters:
          - name: OUTPUT
            valueFrom:
              expression: "inputs.parameters.INPUT.splitList(':')[1]"
  securityContext:
    runAsNonRoot: true
    runAsUser: 1001
  serviceAccountName: abc

Screenshot 2023-04-27 at 12 41 03 PM

What Expect:
Workflow status.phase should always be Error as long as there is an Error node.

apiVersion: argoproj.io/v1alpha1
kind: Workflow                  # new type of k8s spec
metadata:
  generateName: hello-world-    # name of the workflow spec
spec:
  entrypoint: test          # invoke the whalesay template
  templates:
    - name: instant-dummy
      suspend:
        duration: "0.1s"
    - name: test              # name of the template
      dag:
        tasks:
        - name: A
          template: evaluation
          arguments:
            parameters:
              - name: INPUT
                value: "abc"
    - name: evaluation
      steps:
      - - name: placeholder
          template: instant-dummy
          when: "false"
      inputs:
        parameters:
          - name: INPUT
      outputs:
        parameters:
          - name: OUTPUT
            valueFrom:
              expression: "inputs.parameters.INPUT.splitList(':')[1]"
  securityContext:
    runAsNonRoot: true
    runAsUser: 1001
  serviceAccountName: abc

Screenshot 2023-04-27 at 12 41 22 PM

Version

v3.4.5 and v3.4.7

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow                  # new type of k8s spec
metadata:
  generateName: hello-world-    # name of the workflow spec
spec:
  entrypoint: test          # invoke the whalesay template
  templates:
    - name: instant-dummy
      suspend:
        duration: "0.1s"
    - name: test              # name of the template
      dag:
        tasks:
        - name: A
          template: evaluation
          arguments:
            parameters:
              - name: INPUT
                value: "abc"
        - name: B
          depends: A
          template: instant-dummy
    - name: evaluation
      steps:
      - - name: placeholder
          template: instant-dummy
          when: "false"
      inputs:
        parameters:
          - name: INPUT
      outputs:
        parameters:
          - name: OUTPUT
            valueFrom:
              expression: "inputs.parameters.INPUT.splitList(':')[1]"
  securityContext:
    runAsNonRoot: true
    runAsUser: 1001

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}


time="2023-04-27T19:40:01.202Z" level=info msg="Processing workflow" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="Get configmaps 404"
time="2023-04-27T19:40:01.208Z" level=warning msg="Non-transient error: configmaps \"artifact-repositories\" not found"
time="2023-04-27T19:40:01.208Z" level=info msg="resolved artifact repository" artifactRepositoryRef=default-artifact-repository
time="2023-04-27T19:40:01.208Z" level=info msg="Updated phase  -> Running" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="DAG node hello-world-tnq8p initialized Running" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="All of node hello-world-tnq8p.A dependencies [] completed" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="Steps node hello-world-tnq8p-648828699 initialized Running" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="StepGroup node hello-world-tnq8p-2524305911 initialized Running" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="Skipping hello-world-tnq8p.A[0].placeholder: when 'false' evaluated false" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="Skipped node hello-world-tnq8p-625697480 initialized Skipped (message: when 'false' evaluated false)" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="Step group node hello-world-tnq8p-2524305911 successful" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="node hello-world-tnq8p-2524305911 phase Running -> Succeeded" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="node hello-world-tnq8p-2524305911 finished: 2023-04-27 19:40:01.208971904 +0000 UTC" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="Outbound nodes of hello-world-tnq8p-625697480 is [hello-world-tnq8p-625697480]" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="Outbound nodes of hello-world-tnq8p-648828699 is [hello-world-tnq8p-625697480]" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=error msg="Mark error node" error="invalid operation: int(string) (1:25)\n | inputs.parameters.INPUT.splitList(':')[1]\n | ........................^" namespace=wf-fkp-test nodeName=hello-world-tnq8p.A workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="node hello-world-tnq8p-648828699 phase Running -> Error" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="node hello-world-tnq8p-648828699 message: invalid operation: int(string) (1:25)\n | inputs.parameters.INPUT.splitList(':')[1]\n | ........................^" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="node hello-world-tnq8p-648828699 finished: 2023-04-27 19:40:01.20910768 +0000 UTC" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=error msg="Mark error node" error="task 'hello-world-tnq8p.A' errored: invalid operation: int(string) (1:25)\n | inputs.parameters.INPUT.splitList(':')[1]\n | ........................^" namespace=wf-fkp-test nodeName=hello-world-tnq8p.A workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="node hello-world-tnq8p-648828699 message: task 'hello-world-tnq8p.A' errored: invalid operation: int(string) (1:25)\n | inputs.parameters.INPUT.splitList(':')[1]\n | ........................^" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="Skipped node hello-world-tnq8p-665606318 initialized Omitted (message: omitted: depends condition not met)" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="Outbound nodes of hello-world-tnq8p set to [hello-world-tnq8p-665606318]" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="node hello-world-tnq8p phase Running -> Succeeded" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="node hello-world-tnq8p finished: 2023-04-27 19:40:01.209248271 +0000 UTC" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="Checking daemoned children of hello-world-tnq8p" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="TaskSet Reconciliation" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg=reconcileAgentPod namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="Updated phase Running -> Succeeded" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="Marking workflow completed" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="Marking workflow as pending archiving" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="Checking daemoned children of " namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="Workflow to be dehydrated" Workflow Size=2520
time="2023-04-27T19:40:01.214Z" level=info msg="Create events 201"
time="2023-04-27T19:40:01.214Z" level=info msg="cleaning up pod" action=deletePod key=wf-fkp-test/hello-world-tnq8p-1340600742-agent/deletePod
time="2023-04-27T19:40:01.219Z" level=info msg="Update workflows 200"
time="2023-04-27T19:40:01.220Z" level=info msg="Create events 201"
time="2023-04-27T19:40:01.220Z" level=info msg="Delete pods 404"
time="2023-04-27T19:40:01.220Z" level=info msg="Workflow update successful" namespace=wf-fkp-test phase=Succeeded resourceVersion=156903209 workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.223Z" level=info msg="DeleteCollection workflowtaskresults 200"
time="2023-04-27T19:40:01.224Z" level=info msg="archiving workflow" namespace=wf-fkp-test uid=a6c171f4-78c9-4d1c-898e-e0de4d181816 workflow=hello-world-tnq8p


### Logs from in your workflow's wait container

```text
N/A
@terrytangyuan
Copy link
Member

Can you try v3.4.7? This might have been fixed already.

@lxlxok
Copy link
Author

lxlxok commented Apr 27, 2023

Hi @terrytangyuan, thanks for the quick response. I updated the version in this issue, since I just reproduce this issue in v3.4.7.

Screenshot 2023-04-27 at 2 27 57 PM

We only see this issue on Error node. For the Failed Node, argo workflow can always mark the workflow to Failed phase as long as any DAG Node Failed.

@terrytangyuan
Copy link
Member

Thanks. I am able to reproduce.

@terrytangyuan terrytangyuan added the P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important label Apr 28, 2023
@stale
Copy link

stale bot commented Jun 18, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

@stale stale bot added the problem/stale This has not had a response in some time label Jun 18, 2023
@terrytangyuan terrytangyuan removed the problem/stale This has not had a response in some time label Sep 20, 2023
@agilgur5 agilgur5 added area/controller Controller issues, panics area/templates/dag labels Oct 4, 2023
@JasonChen86899
Copy link

JasonChen86899 commented Dec 23, 2023

Is there any updated? If no, could assign it to me? @agilgur5

@agilgur5
Copy link
Contributor

Is there any updated?

Any updates would be in the issue. Please see https://sindresorhus.com/blog/issue-bumping & https://justinmayer.com/posts/any-updates/.

If no, could assign it to me? @agilgur5

You don't need to be assigned to work on something, you can open a PR directly.

@agilgur5 agilgur5 changed the title Argo workflow doesn't failed for Error node Workflow not Failed despite having Error node Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics area/templates/dag P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important type/bug
Projects
None yet
Development

No branches or pull requests

5 participants