Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node should fail when there is ErrImagePull #4229

Closed
cotterpl opened this issue Oct 7, 2020 · 3 comments
Closed

Node should fail when there is ErrImagePull #4229

cotterpl opened this issue Oct 7, 2020 · 3 comments
Labels

Comments

@cotterpl
Copy link

cotterpl commented Oct 7, 2020

Summary

Prepared a workflow with a loop. Container being run in the loop can not be pulled as it does not exist. Kubernetes returns ErrImagePull.

What happened

Step is stuck in Pending status. Workflow is stuck in Running status.

Expected to happen

Step is failed, workflow is failed.

Diagnostics

What Kubernetes provider are you using?

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-13T16:12:48Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-13T16:04:18Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

With docker desktop for Mac

What version of Argo Workflows are you running?

argo: v2.11.0
  BuildDate: 2020-09-17T21:04:31Z
  GitCommit: f8e750de5ebab6f3c494c972889b31ef24c73c9b
  GitTreeState: clean
  GitTag: v2.11.0
  GoVersion: go1.13.15
  Compiler: gc
  Platform: darwin/amd64

Workflow to reproduce

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: loops-maps-
spec:
  entrypoint: loop-map
  templates:
    - name: loop-map
      steps:
        - - name: preprocess
            template: preprocess-source
            arguments:
              parameters:
                - name: source
                  value: "{{item.source}}"
            withItems:
              - { source: '00000039' }
              - { source: '00000536' }
    - name: preprocess-source
      inputs:
        parameters:
          - name: source
      container:
        image: can-not-pull-me
        command: [ python, /worker.py ]
        args: [ "{{inputs.parameters.source}}" ]

kubectl get wf -o yaml loops-maps-8n25j

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  creationTimestamp: "2020-10-07T11:08:06Z"
  generateName: loops-maps-
  generation: 23
  labels:
    workflows.argoproj.io/creator: system-serviceaccount-argo-argo-server
    workflows.argoproj.io/phase: Running
  managedFields:
  - apiVersion: argoproj.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:generateName: {}
        f:labels:
          .: {}
          f:workflows.argoproj.io/creator: {}
      f:spec:
        .: {}
        f:arguments: {}
        f:entrypoint: {}
        f:templates: {}
      f:status:
        .: {}
        f:finishedAt: {}
    manager: argo
    operation: Update
    time: "2020-10-07T11:08:06Z"
  - apiVersion: argoproj.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          f:workflows.argoproj.io/phase: {}
      f:status:
        f:nodes:
          .: {}
          f:loops-maps-8n25j:
            .: {}
            f:children: {}
            f:displayName: {}
            f:finishedAt: {}
            f:id: {}
            f:name: {}
            f:phase: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
          f:loops-maps-8n25j-409258822:
            .: {}
            f:boundaryID: {}
            f:displayName: {}
            f:finishedAt: {}
            f:hostNodeName: {}
            f:id: {}
            f:inputs:
              .: {}
              f:parameters: {}
            f:message: {}
            f:name: {}
            f:phase: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
          f:loops-maps-8n25j-1148271936:
            .: {}
            f:boundaryID: {}
            f:children: {}
            f:displayName: {}
            f:finishedAt: {}
            f:id: {}
            f:name: {}
            f:phase: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
          f:loops-maps-8n25j-3586297921:
            .: {}
            f:boundaryID: {}
            f:displayName: {}
            f:finishedAt: {}
            f:hostNodeName: {}
            f:id: {}
            f:inputs:
              .: {}
              f:parameters: {}
            f:message: {}
            f:name: {}
            f:phase: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
        f:phase: {}
        f:startedAt: {}
    manager: workflow-controller
    operation: Update
    time: "2020-10-07T11:14:29Z"
  name: loops-maps-8n25j
  namespace: argo
  resourceVersion: "94389"
  selfLink: /apis/argoproj.io/v1alpha1/namespaces/argo/workflows/loops-maps-8n25j
  uid: 0b47673c-dfa5-411f-87e3-00a7471dbc87
spec:
  arguments: {}
  entrypoint: loop-map
  templates:
  - arguments: {}
    inputs: {}
    metadata: {}
    name: loop-map
    outputs: {}
    steps:
    - - arguments:
          parameters:
          - name: source
            value: '{{item.source}}'
        name: preprocess
        template: preprocess-source
        withItems:
        - source: "00000039"
        - source: "00000536"
  - arguments: {}
    container:
      args:
      - '{{inputs.parameters.source}}'
      command:
      - python
      - /worker.py
      image: can-not-pull-me
      name: ""
      resources: {}
    inputs:
      parameters:
      - name: source
    metadata: {}
    name: preprocess-source
    outputs: {}
status:
  finishedAt: null
  nodes:
    loops-maps-8n25j:
      children:
      - loops-maps-8n25j-1148271936
      displayName: loops-maps-8n25j
      finishedAt: null
      id: loops-maps-8n25j
      name: loops-maps-8n25j
      phase: Running
      startedAt: "2020-10-07T11:08:06Z"
      templateName: loop-map
      templateScope: local/loops-maps-8n25j
      type: Steps
    loops-maps-8n25j-409258822:
      boundaryID: loops-maps-8n25j
      displayName: preprocess(0:source:00000039)
      finishedAt: null
      hostNodeName: docker-desktop
      id: loops-maps-8n25j-409258822
      inputs:
        parameters:
        - name: source
          value: "00000039"
      message: 'ImagePullBackOff: Back-off pulling image "can-not-pull-me"'
      name: loops-maps-8n25j[0].preprocess(0:source:00000039)
      phase: Pending
      startedAt: "2020-10-07T11:08:06Z"
      templateName: preprocess-source
      templateScope: local/loops-maps-8n25j
      type: Pod
    loops-maps-8n25j-1148271936:
      boundaryID: loops-maps-8n25j
      children:
      - loops-maps-8n25j-409258822
      - loops-maps-8n25j-3586297921
      displayName: '[0]'
      finishedAt: null
      id: loops-maps-8n25j-1148271936
      name: loops-maps-8n25j[0]
      phase: Running
      startedAt: "2020-10-07T11:08:06Z"
      templateName: loop-map
      templateScope: local/loops-maps-8n25j
      type: StepGroup
    loops-maps-8n25j-3586297921:
      boundaryID: loops-maps-8n25j
      displayName: preprocess(1:source:00000536)
      finishedAt: null
      hostNodeName: docker-desktop
      id: loops-maps-8n25j-3586297921
      inputs:
        parameters:
        - name: source
          value: "00000536"
      message: 'ImagePullBackOff: Back-off pulling image "can-not-pull-me"'
      name: loops-maps-8n25j[0].preprocess(1:source:00000536)
      phase: Pending
      startedAt: "2020-10-07T11:08:06Z"
      templateName: preprocess-source
      templateScope: local/loops-maps-8n25j
      type: Pod
  phase: Running
  startedAt: "2020-10-07T11:08:06Z"

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@alexec alexec changed the title Docker image can not be pulled but workflow is still Running Node should fail running even when there is ErrImagePull Oct 7, 2020
@alexec alexec changed the title Node should fail running even when there is ErrImagePull Node should fail when there is ErrImagePull Oct 7, 2020
@alexec
Copy link
Contributor

alexec commented Oct 7, 2020

I'm not 100% sure this is a bug. The node should say in pending if the pod is pending. Can you please attach the pod status?

@cotterpl
Copy link
Author

cotterpl commented Oct 8, 2020

Pod status is pending. Container status is Waiting. It seems you are correct and it is not a bug as Kubernetes is trying (and retrying) to pull the image.

Name:         loops-maps-r7bln-3556965675
Namespace:    argo
Priority:     0
Node:         docker-desktop/192.168.65.3
Start Time:   Thu, 08 Oct 2020 08:18:08 +0200
Labels:       workflows.argoproj.io/completed=false
              workflows.argoproj.io/workflow=loops-maps-r7bln
Annotations:  workflows.argoproj.io/node-name: loops-maps-r7bln[0].preprocess(1:source:00000536)
              workflows.argoproj.io/template:
                {"name":"preprocess-source","arguments":{},"inputs":{"parameters":[{"name":"source","value":"00000536"}]},"outputs":{},"metadata":{},"cont...
Status:       Pending
IP:           10.1.3.54
IPs:
  IP:           10.1.3.54
Controlled By:  Workflow/loops-maps-r7bln
Containers:
  wait:
    Container ID:  docker://904f8425c75ec614fcef767cb5a007ab1fdb7990c0b6c2f47aa18c1e4603426b
    Image:         argoproj/argoexec:v2.11.1
    Image ID:      docker-pullable://argoproj/argoexec@sha256:574f8eb926820149bc98c4fc6b3c3b48ecdf6046f4f325643024e5684a0653a0
    Port:          <none>
    Host Port:     <none>
    Command:
      argoexec
      wait
    State:          Running
      Started:      Thu, 08 Oct 2020 08:18:09 +0200
    Ready:          True
    Restart Count:  0
    Environment:
      ARGO_POD_NAME:  loops-maps-r7bln-3556965675 (v1:metadata.name)
    Mounts:
      /argo/podmetadata from podmetadata (rw)
      /var/run/docker.sock from docker-sock (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-zwr64 (ro)
  main:
    Container ID:  
    Image:         can-not-pull-me
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      python
      /worker.py
    Args:
      00000536
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-zwr64 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  podmetadata:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.annotations -> annotations
  docker-sock:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/docker.sock
    HostPathType:  Socket
  default-token-zwr64:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-zwr64
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                     From                     Message
  ----     ------     ----                    ----                     -------
  Normal   Scheduled  <unknown>               default-scheduler        Successfully assigned argo/loops-maps-r7bln-3556965675 to docker-desktop
  Normal   Pulled     8m45s                   kubelet, docker-desktop  Container image "argoproj/argoexec:v2.11.1" already present on machine
  Normal   Created    8m45s                   kubelet, docker-desktop  Created container wait
  Normal   Started    8m45s                   kubelet, docker-desktop  Started container wait
  Warning  Failed     7m20s (x5 over 8m43s)   kubelet, docker-desktop  Error: ImagePullBackOff
  Normal   Pulling    7m5s (x4 over 8m45s)    kubelet, docker-desktop  Pulling image "can-not-pull-me"
  Warning  Failed     7m4s (x4 over 8m43s)    kubelet, docker-desktop  Failed to pull image "can-not-pull-me": rpc error: code = Unknown desc = Error response from daemon: pull access denied for can-not-pull-me, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
  Warning  Failed     7m4s (x4 over 8m43s)    kubelet, docker-desktop  Error: ErrImagePull
  Normal   BackOff    3m39s (x20 over 8m43s)  kubelet, docker-desktop  Back-off pulling image "can-not-pull-me"

@jhammarstedt
Copy link

I am facing similar issue when our pods stay pending. Is there a way to config it to exit on ErrImagePull?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants