Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to read csi driver mounted file in OpenShift #54

Open
raffaelespazzoli opened this issue Oct 31, 2022 · 20 comments · May be fixed by #105
Open

unable to read csi driver mounted file in OpenShift #54

raffaelespazzoli opened this issue Oct 31, 2022 · 20 comments · May be fixed by #105

Comments

@raffaelespazzoli
Copy link

I installed this csi driver in OpenShift and it does not work. I can see the unix socket mounted in the container, but I get this error:

drwxr-xr-x.   2 root root   60 Oct 29 20:28 spiffe-workload-api
drwxr-xr-x.   2 root root    6 Jun 21  2021 srv
dr-xr-xr-x.  13 root root    0 Oct 29 19:14 sys
drwxrwxrwx.   2 root root   58 Oct 26 11:23 tmp
drwxr-xr-x.  12 root root  144 Oct 26 11:09 usr
drwxr-xr-x.  19 root root  249 Oct 26 11:09 var
sh-4.4$ ls -la spiffe-workload-api
ls: cannot open directory 'spiffe-workload-api': Permission denied
sh-4.4$ id
uid=1000690000(1000690000) gid=0(root) groups=0(root),1000690000

as you can see, based on the group, I should be able to ls that directory. I suspect selinux might be in play here. Any suggestions?

also this is failing:

/opt/spire $ /opt/spire/bin/spire-agent api fetch -socketPath $SPIFFE_ENDPOINT_SOCKET
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /opt/spire/unix:/spiffe-workload-api/spire-agent.sock: connect: no such file or directory"

see also spiffe/spire-tutorials#95

@raffaelespazzoli
Copy link
Author

@azdagron my deployment meets your requirements:

        - name: spiffe-csi-driver
          image: ghcr.io/spiffe/spiffe-csi-driver:nightly

and:

        volumeMounts:
        - name: spiffe-workload-api
          mountPath: /spiffe-workload-api
          readOnly: true                        
      volumes:
      - name: spiffe-workload-api
        csi:
          driver: "csi.spiffe.io"
          readOnly: true  

@raffaelespazzoli
Copy link
Author

I found proof that selinux is denying access

time->Fri Nov  4 17:52:55 2022
type=PROCTITLE msg=audit(1667584375.484:68): proctitle=6C73002D6C61002F7370696666652D776F726B6C6F61642D617069
type=SYSCALL msg=audit(1667584375.484:68): arch=c000003e syscall=2 success=no exit=-13 a0=7ffedde6edda a1=98000 a2=0 a3=0 items=0 ppid=266580 pid=266767 auid=4294967295 uid=1000680000 gid=0 euid=1000680000 suid=1000680000 fsuid=1000680000 egid=0 sgid=0 fsgid=0 tty=pts0 ses=4294967295 comm="ls" exe="/bin/busybox" subj=system_u:system_r:container_t:s0:c15,c26 key=(null)
type=AVC msg=audit(1667584375.484:68): avc:  denied  { read } for  pid=266767 comm="ls" name="agent-sockets" dev="tmpfs" ino=1257099 scontext=system_u:system_r:container_t:s0:c15,c26 tcontext=system_u:object_r:container_var_run_t:s0 tclass=dir permissive=0

@azdagron
Copy link
Member

azdagron commented Nov 4, 2022

I don't have any experience running the CSI driver in OpenShift. Perhaps @bendikp or @erikgb have some insight into SELinux policies in OpenShift that might impact this (they previously filed issue #42 related to failures in Openshift due to SELinux).

@erikgb
Copy link

erikgb commented Nov 4, 2022

@azdagron Thanks for the ping! I was supposed to follow up on this, but had a lot of other things to do lately. 😅 It turns out that this is not something specific to Openshift, as I was able to reproduce the same problem in Rancher RKE2 if I enable SELinux. And I wonder if something can be done to mend this in the Spiffe CSI driver? 🤔 I suspect it to be related to the fact that we now mount the socket with read-write, but enforce the user to configure it as read-only. But I am mainly a developer, and not a sysadmin, and I also have limited experience with SELinux (except frustrations about non-working stuff). 😉

In the meantime, I managed to work around this problem (on RKE2) by installing https://github.com/kubernetes-sigs/security-profiles-operator, and ensuring the workloads that tries to mount the socket using Spiffe CSI, have the following resource bound:

apiVersion: security-profiles-operator.x-k8s.io/v1alpha2
kind: SelinuxProfile
spec:
  allow:
    container_var_run_t:
      sock_file:
      - write
  inherit:
  - kind: System
    name: container

We were supposed to try a similar approach on our Openshift clusters, but I think other tasks came in the way. Maybe @bendikp has something to add?

@raffaelespazzoli
Copy link
Author

ping, is there any update on this issue? Has it been resolved?

@bendikp
Copy link

bendikp commented Jan 26, 2023

I haven't had time to test the CSI driver with the proper SELinux profile on OpenShift yet. I'll try find some time to test this soon.

@azdagron azdagron changed the title unable to read csi driver mounted file unable to read csi driver mounted file in OpenShift Feb 25, 2023
@sjberman
Copy link

sjberman commented Mar 30, 2023

Not sure if this helps, but we run an ubuntu init container to set the permissions in OpenShift.

initContainers:
- name: set-context
  image: ubuntu:22.04
  command: ["chcon", "-Rt", "container_file_t", "spire-agent-socket/"]
  volumeMounts:
  - name: spire-agent-socket
    mountPath: /spire-agent-socket

@erikgb
Copy link

erikgb commented Mar 30, 2023

Not sure if this helps, but we run an ubuntu init container to set the permissions in OpenShift

@sjberman Thanks for sharing! Brilliant idea that we will test in our clusters. 👍👏

@erikgb
Copy link

erikgb commented Apr 1, 2023

I wonder if this can be fixed somehow inside the CSI driver using https://pkg.go.dev/github.com/opencontainers/[email protected]/go-selinux, to avoid the imot container. WDYT @azdagron?

@bortek
Copy link

bortek commented May 10, 2023

A "sysadmin" chiming in here with the developer breathing my neck :) Any more updates or resolution on this?

We have the same issue in openshift 4.11 and these are the erros from the audit.log file where I can see that it tries to write and gets deny

Any idea what can we try to get around this?

type=AVC msg=audit(1683718046.290:4085): avc: denied { write } for pid=1930089 comm="spire-helper" name="spire-agent.sock" dev="tmpfs" ino=1787074296 scontext=system_u:system_r:container_t:s0:c15,c31 tcontext=system_u:object_r:container_var_run_t:s0 tclass=sock_file permissive=0

type=SYSCALL msg=audit(1683718046.290:4085): arch=c000003e syscall=42 success=no exit=-13 a0=7 a1=c0001f2310 a2=28 a3=b18a175e items=0 ppid=1930041 pid=1930089 auid=4294967295 uid=1000960000 gid=0 euid=1000960000 suid=1000960000 fsuid=1000960000 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="spire-helper" exe="/usr/local/bin/spire-helper" subj=system_u:system_r:container_t:s0:c15,c31 key=(null)ARCH=x86_64 SYSCALL=connect AUID="unset" UID="unknown(1000960000)" GID="root" EUID="unknown(1000960000)" SUID="unknown(1000960000)" FSUID="unknown(1000960000)" EGID="root" SGID="root" FSGID="root"

@erikgb
Copy link

erikgb commented May 11, 2023

@bortek I can confirm that the workaround suggested by @sjberman in #54 (comment) works like a charm in our Openshift clusters. It would be nice if this could be handled inside the CSI driver, but the workaround is acceptable for us at present.

@azdagron
Copy link
Member

If the functionality provided by the init container can be done within the CSI driver, I'm all for that. I don't have a lot of time to dedicate to this component at the moment, so.... patches welcome? :)

@bortek
Copy link

bortek commented May 12, 2023

We are verifying this at the moment. Will get back shortly with the findings.

@bortek
Copy link

bortek commented May 12, 2023

It seems to work in openshift 4.11.37. I have made a PR with the initContainer

#104

I think i put it in the right place and even manmaged to get the indets correctly.

Someone please double check and review/approve.

azdagron added a commit that referenced this issue May 14, 2023
This allows the driver to be used within OpenShift without using an init
container to set the label.

Fixes #54.
@azdagron azdagron linked a pull request May 14, 2023 that will close this issue
azdagron added a commit that referenced this issue May 14, 2023
This allows the driver to be used within OpenShift without using an init
container to set the label.

Fixes #54.

Signed-off-by: Andrew Harding <[email protected]>
azdagron added a commit that referenced this issue May 14, 2023
This allows the driver to be used within OpenShift without using an init
container to set the label.

Fixes #54.

Signed-off-by: Andrew Harding <[email protected]>
azdagron added a commit that referenced this issue May 14, 2023
This allows the driver to be used within OpenShift without using an init
container to set the label.

Fixes #54.

Signed-off-by: Andrew Harding <[email protected]>
azdagron added a commit that referenced this issue May 14, 2023
This allows the driver to be used within OpenShift without using an init
container to set the label.

Fixes #54.

Signed-off-by: Andrew Harding <[email protected]>
azdagron added a commit that referenced this issue May 14, 2023
This allows the driver to be used within OpenShift without using an init
container to set the label.

Fixes #54.

Signed-off-by: Andrew Harding <[email protected]>
azdagron added a commit that referenced this issue May 14, 2023
This allows the driver to be used within OpenShift without using an init
container to set the label.

Fixes #54.

Signed-off-by: Andrew Harding <[email protected]>
azdagron added a commit that referenced this issue May 14, 2023
This allows the driver to be used within OpenShift without using an init
container to set the label.

Fixes #54.

Signed-off-by: Andrew Harding <[email protected]>
@azdagron
Copy link
Member

I put together a PR to set the label from within the CSI driver. It does require that the CSI driver mount the Workload API socket directory volume read-write so it can apply the label. This is a departure from the existing deployment examples. To that end, if the driver fails to apply the label, it logs a warning and keeps chugging along.

I could use some help testing this in OpenShift if somebody has some cycles.

azdagron added a commit that referenced this issue May 29, 2023
This allows the driver to be used within OpenShift without using an init
container to set the label.

Fixes #54.

Signed-off-by: Andrew Harding <[email protected]>
@kfox1111
Copy link
Contributor

kfox1111 commented Nov 1, 2023

We just added basic openshift support to the helm-charts-hardened project. spiffe/helm-charts-hardened#13

Ingress support is coming soon, and then I think a release happens after that. So, soon.

@erikgb
Copy link

erikgb commented Nov 1, 2023

We just added basic openshift support to the helm-charts-hardened project. spiffe/helm-charts-hardened#13

Ingress support is coming soon, and then I think a release happens after that. So, soon.

But this is not an Openshift-problem. It's a problem when SELinux is enabled on nodes - which Openshift does by default. But I have reproduced the same issue on Rancher RKE2 (with SELinux enabled).

I must say I am not a big fan of Helm charts tailored for Openshift (openshift: true), but I'll leave that decision to the Helmers. 😉 But this issue is NOT an Openshift issue IMO.

@kfox1111
Copy link
Contributor

kfox1111 commented Nov 1, 2023

True. The openshift flag is there to help set defaults so you don't have to change all the things that vary by openshift. That doesn't mean the features cant be used outside of openshift, or couldn't be made easier to use. would an selinux: true flag help?

@erikgb
Copy link

erikgb commented Nov 1, 2023

As far as I can see, the only change relevant for this issue, is the ability to add and init-container to perform the same hack we are currently using as a workaround to this issue. IMO SCC's, which IS an Openshift-thing, should be controlled externally to the Helm chart. It is an extra layer of security that only makes sense IMO if it's controlled separately from the provisioning of the app. A bit like SELinux (inside Kubernetes).

@erikgb
Copy link

erikgb commented Nov 1, 2023

This issue could/should be solved inside the CSI-driver, as @azdagron has tried to do in #105, but the last time I tested the proposed changes, it did not solve the issue. Users of a Kubernetes/Openshift cluster should ideally not be bothered with changing SELinux policies on nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants