Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi-s3 daemonset pod restart causes mounted pvc not accessible #29

Closed
adux6991 opened this issue Aug 18, 2022 · 7 comments
Closed

csi-s3 daemonset pod restart causes mounted pvc not accessible #29

adux6991 opened this issue Aug 18, 2022 · 7 comments

Comments

@adux6991
Copy link

Problem

If the csi-s3 daemonset pod restarts for some reason, the pod that mounts s3-based pvc will not be able to access the pvc and reports "Transport endpoint is not connected".

Reproduce

  • Deploy k8s-csi-s3 following README
cd deploy/kubernetes
kubectl create -f examples/secret.yaml
kubectl create -f provisioner.yaml
kubectl create -f attacher.yaml
kubectl create -f csi-s3.yaml
kubectl create -f examples/storageclass.yaml
  • Create pvc and pod
kubectl create -f examples/pvc.yaml
kubectl create -f examples/pod.yaml
  • Find which node the test pod is on and delete corresponding daemonset pod
# the test pod
kubectl get pod -o wide
# the daemonset pods
kubectl get pod -n kube-system -o wide
# delete corresponding daemonset pod to restart it
kubectl delete pod -n kube-system csi-s3-ccx2k
  • Try to access pvc from pod
$ kubectl exec -ti csi-s3-test-nginx bash
$ ls /usr/share/nginx/html/s3
ls: cannot access '/usr/share/nginx/html/s3': Transport endpoint is not connected

Found nothing special from csi pod logs.

Related

This issue describes the same problem and its maintainer suggests that LIST_VOLUMES and LIST_VOLUMES_PUBLISHED_NODES should be implemented. Would you please have a look?

@vitalif
Copy link
Collaborator

vitalif commented Sep 9, 2022

Hi, interesting, I'll have a look

@vitalif
Copy link
Collaborator

vitalif commented Sep 19, 2022

RPC_LIST_VOLUMES_PUBLISHED_NODES is now officially not a solution :-) kubernetes-csi/external-attacher#374 (comment)

I'll try to solve this problem in some other way, for example by persisting mounts on each node in a "state file" and remounting them on restart. But this approach may fail because it's not guaranteed that these "remounts" will propagate to application pods correctly. And if it fails then we'll also have to think about moving FUSE processes out of the mounter pod.

@adux6991
Copy link
Author

@vitalif Followed your footprints across several repos, the reason behind this issue has been much more clear to me. Thanks for your efforts and look forward to good news!

@PenzinAlexander
Copy link

Follow up. Any help would be much appreciated.
Thanks

@vitalif
Copy link
Collaborator

vitalif commented Nov 23, 2022

Okay, I checked the state file approach and predictably it doesn't work because the re-mounted mount isn't propagated into the application pod.
Now I want to try another approach and try to make csi-s3 launch geesefs processes outside the container. I'll try to do it using systemd (systemd-run or something like that) because it seems that other container "breakthrough" methods are more illegal :).

@vitalif
Copy link
Collaborator

vitalif commented Mar 2, 2023

I finally implemented running outside of the container using transient systemd units. The code is in master branch, not released yet. One slight ugliness is that it still requires to be ran as root on host - not a big deal compared to the current version in fact - it also runs under root in a privileged container, but I still prefer to make software run without root privileges where it's possible... I'll probably try to add an option to geesefs to drop root privileges itself after initializing the fuse mount, that will solve this issue.

Anyway you can already try the new version if you build code from master branch yourself :)

@vitalif
Copy link
Collaborator

vitalif commented Mar 7, 2023

The fix is released. Note that it's kind of strange :) because it starts geesefs outside the container. But it allows to not crash mountpoints when updating csi-s3. Also the new version doesn't start multiple geesefs processes for one volume mounted to multiple containers on one host, it only starts one geesefs per volume per host.

Try it in 0.34.7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants