Transport endpoint is not connected when csi-s3 pod is restarted #153

bbenlazreg · 2022-01-25T11:48:16Z

If for any reason the csi-s3 pod is restarted, the Pod that uses s3 volumes looses connectivity to the mount target and we get
Transport endpoint is not connected error
The error is solved if we restart the pod that uses the volume this forces csi-s3 pod to remount the volume.

I think when csi-s3 restarts it should check for existing volumes and remount the volume.

To reproduce this behaviour just rollout restart the deamonset
Could you please take a look ?

The text was updated successfully, but these errors were encountered:

raj-katonic · 2022-02-02T11:41:20Z

Facing the same issue

raj-katonic · 2022-02-02T12:08:49Z

Rolling out daemonsets and dataset-operator in dlf namespace altogether fixed this issue for me

bbenlazreg · 2022-02-08T11:45:35Z

Actually restarting the operator did not fix the issue for me, the only thing that fixes the issue is to restart the pod that uses the pvc created by dataset operator, but would be better if when the daemonset or operator restarts reconciles the mount, otherwise each time we update the csi provider to a new version connectivity will be lost on all pods

PS: the issue is happening for goofys and s3fs mounters

Scenario to reproduce:
1- Create an S3 dataset
2- Create a Pod mounting the pvc created by dataset
3- Restart the csi-s3 deamonset
==> Transport endpoint is not connected

attacher logs
I0208 14:55:53.304459 1 reflector.go:535] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.PersistentVolume total 0 items received I0208 14:59:03.311025 1 reflector.go:535] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.VolumeAttachment total 0 items received I0208 15:02:23.307740 1 reflector.go:535] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.PersistentVolume total 0 items received I0208 15:04:48.294177 1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync I0208 15:04:48.294421 1 controller.go:208] Started VA processing "csi-0cd1a70398bbe7c6ed68a5ed04b9fa487d8ace466600da1be96e21d78b656b6d" I0208 15:04:48.294433 1 controller.go:223] Skipping VolumeAttachment csi-0cd1a70398bbe7c6ed68a5ed04b9fa487d8ace466600da1be96e21d78b656b6d for attacher blockvolume.csi.oraclecloud.com I0208 15:04:48.294192 1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync I0208 15:04:48.294462 1 controller.go:208] Started VA processing "csi-463b3945e8dd840a016d75511db98296afbaba07fbdc54f71d60f3c448afcbde" I0208 15:04:48.294467 1 controller.go:223] Skipping VolumeAttachment csi-463b3945e8dd840a016d75511db98296afbaba07fbdc54f71d60f3c448afcbde for attacher blockvolume.csi.oraclecloud.com I0208 15:04:48.294454 1 controller.go:208] Started VA processing "csi-20b7e9e5d47a9eb0c1a350d27f8c7e27c04de6d83e95128189c8eafd0a923fe5" I0208 15:04:48.294490 1 controller.go:208] Started VA processing "csi-27f260c4f75284c142c5f33aaa4d8ea8a985e82301bb80cfd76b31e8d9433db9"

Can someone please take a look on this ?

srikumar003 · 2022-02-14T10:29:29Z

Verified that this problem exists. To solve this, the CSI-S3 driver would need to be extended to support LIST_VOLUMES and LIST_VOLUMES_PUBLISHED_NODES so that the external attacher can periodically re-sync the volumes. A better option would be to support external health monitor but this may involve changing dependencies to K8s 1.22+ (see #156) as well as extending the driver.

This will be a sizeable development, so not sure about the timelines yet.

nikhil-das-katonic · 2022-08-11T08:40:32Z

Tried adding an extra argument --reconcile-sync=10s to the csi-attacher-s3 StatefulSet. This resolved the issue to some degree, though it still comes up when trying to writing large amount of files consecutively to the same bucket (PVC).

vitalif · 2022-09-19T17:50:19Z

To solve this, the CSI-S3 driver would need to be extended to support LIST_VOLUMES and LIST_VOLUMES_PUBLISHED_NODES so that the external attacher can periodically re-sync the volumes

RPC_LIST_VOLUMES_PUBLISHED_NODES is officially not a solution :-) kubernetes-csi/external-attacher#374 (comment)

srikumar003 · 2022-09-19T21:20:52Z

@vitalif Thanks for researching this issue, though the answer is disappointing :-)

CSI-S3 (atleast Datashim's fork) uses Bidirectional mount propagation which has caused some issues, such as the need for privileged containers (#139) and is preventing full support for ephemeral volumes (#164). Unfortunately, we haven't been able to find a way around it, yet.

If you do have a workaround, I'll be happy to look into it.

rrehman-hbk · 2023-12-14T10:00:24Z

Any update on when this will be resolved. We are also facing this issue. We are getting this very frequently, we are mounting s3 to 5-6 pods. Whenever we do some read or write, we are getting this error, And we have to restart frequently

srikumar003 · 2023-12-14T10:29:30Z

@rrehman-hbk Could I ask under what conditions are you getting errors for read/write from S3 buckets ? This is a different problem from the one above. If you can create an issue and post the logs from your csi-s3 pods in there, then I could take a look at them

rrehman-hbk · 2023-12-14T10:40:48Z

@srikumar003 #324
raised a separate issue

paullryan · 2024-02-11T19:02:08Z

Also of note in this case for me if I have a livenessProbe kill the container it cannot just pickup from where it left off the whole pod must be destroyed. The CSI-S3 deamon reports the following

I0211 18:58:01.984006       1 utils.go:103] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}}]}
I0211 18:58:03.164542       1 utils.go:97] GRPC call: /csi.v1.Controller/DeleteVolume
I0211 18:58:03.164743       1 utils.go:98] GRPC request: {"volume_id":"pvc-4c8d7779-a5cc-4d29-897b-f94e9ab6ca9b"}
I0211 18:58:03.164950       1 controllerserver.go:131] Deleting volume pvc-4c8d7779-a5cc-4d29-897b-f94e9ab6ca9b
E0211 18:58:03.165086       1 utils.go:101] GRPC error: failed to initialize S3 client: Endpoint:  does not follow ip address or domain name standards.

If the pod is subsequently restarted the mount the succeeds and all is fine again.

4F2E4A2E · 2024-03-04T16:32:01Z

+1

ehsan310 · 2024-11-05T09:53:24Z

+1 here , i think I am seeing the issue as well.

adippl · 2024-11-17T23:29:12Z

cephfs csi driver also encountered this fuse process restart issue.
This PR fixed their fuse issues?
ceph/ceph-csi#2634

Could this approach solve the issue for datashim/k8s-csi-s3 ?

srikumar003 mentioned this issue May 2, 2022

Transport endpoint is not connected #135

Closed

srikumar003 added help wanted Extra attention is needed bug Something isn't working labels May 2, 2022

adux6991 mentioned this issue Aug 18, 2022

csi-s3 daemonset pod restart causes mounted pvc not accessible yandex-cloud/k8s-csi-s3#29

Closed

vitalif mentioned this issue Sep 19, 2022

Question about reconciling (reconcileVA) based on RPC_LIST_VOLUMES_PUBLISHED_NODES kubernetes-csi/external-attacher#374

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transport endpoint is not connected when csi-s3 pod is restarted #153

Transport endpoint is not connected when csi-s3 pod is restarted #153

bbenlazreg commented Jan 25, 2022 •

edited

Loading

raj-katonic commented Feb 2, 2022

raj-katonic commented Feb 2, 2022

bbenlazreg commented Feb 8, 2022 •

edited

Loading

srikumar003 commented Feb 14, 2022 •

edited

Loading

nikhil-das-katonic commented Aug 11, 2022 •

edited

Loading

vitalif commented Sep 19, 2022

srikumar003 commented Sep 19, 2022

rrehman-hbk commented Dec 14, 2023

srikumar003 commented Dec 14, 2023

rrehman-hbk commented Dec 14, 2023

paullryan commented Feb 11, 2024

4F2E4A2E commented Mar 4, 2024

ehsan310 commented Nov 5, 2024

adippl commented Nov 17, 2024

Transport endpoint is not connected when csi-s3 pod is restarted #153

Transport endpoint is not connected when csi-s3 pod is restarted #153

Comments

bbenlazreg commented Jan 25, 2022 • edited Loading

raj-katonic commented Feb 2, 2022

raj-katonic commented Feb 2, 2022

bbenlazreg commented Feb 8, 2022 • edited Loading

srikumar003 commented Feb 14, 2022 • edited Loading

nikhil-das-katonic commented Aug 11, 2022 • edited Loading

vitalif commented Sep 19, 2022

srikumar003 commented Sep 19, 2022

rrehman-hbk commented Dec 14, 2023

srikumar003 commented Dec 14, 2023

rrehman-hbk commented Dec 14, 2023

paullryan commented Feb 11, 2024

4F2E4A2E commented Mar 4, 2024

ehsan310 commented Nov 5, 2024

adippl commented Nov 17, 2024

bbenlazreg commented Jan 25, 2022 •

edited

Loading

bbenlazreg commented Feb 8, 2022 •

edited

Loading

srikumar003 commented Feb 14, 2022 •

edited

Loading

nikhil-das-katonic commented Aug 11, 2022 •

edited

Loading