Set container file label on workload API directory #105

azdagron · 2023-05-14T00:53:09Z

This allows the driver to be used within OpenShift without using an init container to set the label.

This PR attempts to set the label unconditionally. It will fail on existing deployments that mount the Workload API socket directory read-only into the CSI driver container. Results are logged.

Fixes #54.

erikgb

Thanks, @azdagron! I would like to test this in our clusters with enforcing SELinux. Are images from PRs available from CI, or do I have to build it myself to test? I have been able to reproduce the issue on Openshift, but also on Rancher RKE2 with SELinux enforcing.

erikgb · 2023-05-14T13:26:17Z

test/config/spiffe-csi-driver.yaml

@@ -50,7 +50,7 @@ spec:
            # driver will mount this directory into containers.
            - mountPath: /spire-agent-socket
              name: spire-agent-socket-dir
-              readOnly: true
+                #readOnly: true


Suggested change

#readOnly: true

Or we could explicitly set it to false, with a comment about why?

Oh whoops, I didn't mean to leave this commented line in. But yes, I agree that being explicit and adding a comment would be very helpful and self-documenting.

erikgb · 2023-05-14T13:33:01Z

pkg/driver/driver.go

+	// Set the SELinux label on the workload API directory. This allows the
+	// mount to be used within OpenShift, for example. This will fail if the
+	// Workload API socket directory is mounted read-only.
+	if err := chcon(config.WorkloadAPISocketDir, seLinuxContainerFileLabel, true); err != nil {


Would it make sense to check if SELinux is in enforcing mode before attempting this? That would allow users without enforcing SELinux to use a read-only mount and avoid log errors.

That seems reasonable. I'll look into how to do that.

azdagron · 2023-05-14T13:40:25Z

Are images from PRs available from CI, or do I have to build it myself to test?

Unfortunately not at the moment. You'll have to build it yourself. Once it is merged, the nightly workflow will publish the image.

azdagron · 2023-05-29T02:51:45Z

Hey, @erikgb ! I'd love to get this into a release soon. Have you had a chance to give this a go, by chance?

erikgb · 2023-05-29T11:50:26Z

Hey, @erikgb ! I'd love to get this into a release soon. Have you had a chance to give this a go, by chance?

Hi @azdagron! I had a look at building the image myself a couple of weeks ago. And the combination of a quite complex image build and me sitting on a corporate network appeared like a bad combination. 😉 But after your last poke, I did another attempt. And I think I managed to build the image from a local mirror with some minor adjustments to the build process.

I started off the upstream main branch, just to see that things were working as before. As I cannot "play around" in our Openshift clusters, I had to use our development RKE2 cluster - where I have reproduced the problem earlier.

Build image from main branch, no change in config: Permission denied in the example-workload pod.
Build image from your PR branch, no change in config: Same behavior in the example-workload pod. No error in the spiffe-csi-driver-container.
Still use image from your PR branch, change volumeMount to read-write: Same behavior in the example-workload pod. No error in the spiffe-csi-driver-container.
Add init container that runs chcon: The example-workload pod can connect to the agent socket and obtain it's identities.

So my testing so far shows that this PR does NOT currently fix the problem. Are you able to add some more logging to verify that I haven't messed up my local image build? Or maybe try it yourself in a cluster with SELinux enabled? I'll recommend Rancher RKE2 which we run in a single node configuration for early testing like this.

erikgb · 2023-05-29T12:12:02Z

After removing the init-container, this appears to work, but I'll guess that's because the SELinux settings are already applied on my node. If I change the host path for the agent sockets to a new directory, I'm back to the error in the example-workload: transport: Error while dialing: dial unix /spiffe-workload-api/agent.sock: connect: permission denied.

With the init-container in place this works again:

2023/05/29 12:09:59 Update:
2023-05-29T14:09:59.578614750+02:00 2023/05/29 12:09:59   SVIDs:
2023-05-29T14:09:59.578619524+02:00 2023/05/29 12:09:59     spiffe://infra-dev.mycompany.com/ns/egb-test/sa/default
2023-05-29T14:09:59.578622633+02:00 2023/05/29 12:09:59   Bundles:
2023-05-29T14:09:59.578626048+02:00 2023/05/29 12:09:59     infra-dev.mycompany.com (1 authorities)

azdagron · 2023-05-29T13:01:17Z

Thanks for giving that a go @erikgb!

I've shuffled the code to unconditionally log SELinux status before attempting to do the chcon:

If you build and load the image correctly, you should get a log line like the following (except my status shows not enabled)

INFO	driver/driver.go:66	SELinux status	{"enabled": false, "enforceMode": -1, "processLabel": "", "fileLabel": ""}

Note that make docker-build does not end up loading the docker image into your local registry. You have to run docker load-images afterwards for that.

If you have time to give it a whirl, that would be helpful. In the meantime I'm going to attempt to get RKE2 up.

This allows the driver to be used within OpenShift without using an init container to set the label. Fixes #54. Signed-off-by: Andrew Harding <[email protected]>

Signed-off-by: Andrew Harding <[email protected]>

erikgb · 2023-05-29T13:37:06Z

@azdagron The added logging is interesting:

2023-05-29T13:34:16.189Z	INFO	spiffe-csi-driver/main.go:45	Starting.	{"version": "0.2.3-dev", "nodeID": "h1-a-iaut-kud01.mycompany.com", "workloadAPISocketDir": "/run/spire/sockets2", "csiSocketPath": "/spiffe-csi/csi.sock"}
2023-05-29T15:34:16.190791564+02:00 2023-05-29T13:34:16.190Z	INFO	driver/driver.go:68	SELinux status	{"enabled": false, "enforceMode": -1, "processLabel": "", "fileLabel": ""}
2023-05-29T15:34:16.190803336+02:00 2023-05-29T13:34:16.190Z	INFO	server/server.go:50	Listening...

Very strange indeed. SELinux is for sure enabled, causing the permission denied without the init container. Could there be some native requirements (inside container) for using https://github.com/opencontainers/selinux? I haven't looked at the code, but if it is (just) a Go wrapper around chcon, this won't work from the scratch base image. 🤔 If that's the case, I would expect an error, tho.

erikgb · 2023-05-29T13:49:09Z

I did a quick test using ubi9-minimal (which includes SELinux tools) as the base image, and no difference. 😞 There is something strange going on here....

azdagron · 2023-05-29T13:53:19Z

Is SELinux fs visible in the CSI driver container? or does it need to be mounted somehow?

erikgb · 2023-05-29T14:08:34Z

@azdagron That was a good lead! Inspired by kubernetes-sigs/aws-ebs-csi-driver#1544 (comment) adding the volumes gives another behavior:

2023-05-29T14:06:17.915Z	INFO	spiffe-csi-driver/main.go:45	Starting.	{"version": "0.2.3-dev", "nodeID": "h1-a-iaut-kud01.mycompany.com", "workloadAPISocketDir": "/run/spire/sockets2", "csiSocketPath": "/spiffe-csi/csi.sock"}
2023-05-29T16:06:17.915939560+02:00 2023-05-29T14:06:17.915Z	INFO	driver/driver.go:68	SELinux status	{"enabled": true, "enforceMode": 1, "processLabel": "", "fileLabel": ""}
2023-05-29T16:06:17.916683952+02:00 2023-05-29T14:06:17.916Z	ERROR	spiffe-csi-driver/main.go:59	Failed to create driver	{"error": "failed to set the container file label on the Workload API socket directory: lsetxattr /run/spire/sockets2: invalid argument"}
2023-05-29T16:06:17.916696102+02:00 main.main
2023-05-29T16:06:17.916699682+02:00 	/code/cmd/spiffe-csi-driver/main.go:59
2023-05-29T16:06:17.916702715+02:00 runtime.main
2023-05-29T16:06:17.916705708+02:00 	/usr/local/go/src/runtime/proc.go:250

azdagron · 2023-05-29T14:16:36Z

Ok, looks like I'm going to need to get RKE2 or something similar going to get this buttoned down.

Looks like this story might be improving for k8s 1.27 as well (with CSI driver support): https://kubernetes.io/blog/2023/04/18/kubernetes-1-27-efficient-selinux-relabeling-beta/

empath-nirvana · 2024-03-19T20:07:41Z

Any progress on this? We're up to k8s 1.29 now and that feature has been on by default since 1.28

erikgb · 2024-03-20T07:06:20Z

@empath-nirvana It would help if you could test this feature branch! And suggest improvements that might solve the issue.

@azdagron Are you able to rebase this PR? Seems like there are some conflicting files.

azdagron requested a review from MarcosDY as a code owner May 14, 2023 00:53

azdagron force-pushed the azdagron/set-selinux-container-file-label branch 5 times, most recently from 8446c10 to 6d03ceb Compare May 14, 2023 01:47

azdagron mentioned this pull request May 14, 2023

fix for selinux to make it work in openshift 4.X #104

Closed

azdagron force-pushed the azdagron/set-selinux-container-file-label branch 2 times, most recently from ffecf38 to fc4fe9c Compare May 14, 2023 02:16

erikgb reviewed May 14, 2023

View reviewed changes

azdagron force-pushed the azdagron/set-selinux-container-file-label branch from f36d619 to 7c0cfad Compare May 29, 2023 13:03

azdagron added 3 commits May 29, 2023 07:04

Set container file label on workload API directory

bf2114e

This allows the driver to be used within OpenShift without using an init container to set the label. Fixes #54. Signed-off-by: Andrew Harding <[email protected]>

Address PR comments

73550ce

Signed-off-by: Andrew Harding <[email protected]>

Add some more logging

84bcf56

Signed-off-by: Andrew Harding <[email protected]>

azdagron force-pushed the azdagron/set-selinux-container-file-label branch from 7c0cfad to 84bcf56 Compare May 29, 2023 13:04

fix linting

b95cc88

erikgb mentioned this pull request Nov 1, 2023

unable to read csi driver mounted file in OpenShift #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set container file label on workload API directory #105

Set container file label on workload API directory #105

azdagron commented May 14, 2023 •

edited

Loading

erikgb left a comment

erikgb May 14, 2023

azdagron May 14, 2023

erikgb May 14, 2023

azdagron May 14, 2023

azdagron commented May 14, 2023 •

edited

Loading

azdagron commented May 29, 2023

erikgb commented May 29, 2023

erikgb commented May 29, 2023

azdagron commented May 29, 2023

erikgb commented May 29, 2023 •

edited

Loading

erikgb commented May 29, 2023

azdagron commented May 29, 2023

erikgb commented May 29, 2023

azdagron commented May 29, 2023

empath-nirvana commented Mar 19, 2024

erikgb commented Mar 20, 2024

Set container file label on workload API directory #105

Are you sure you want to change the base?

Set container file label on workload API directory #105

Conversation

azdagron commented May 14, 2023 • edited Loading

erikgb left a comment

Choose a reason for hiding this comment

erikgb May 14, 2023

Choose a reason for hiding this comment

azdagron May 14, 2023

Choose a reason for hiding this comment

erikgb May 14, 2023

Choose a reason for hiding this comment

azdagron May 14, 2023

Choose a reason for hiding this comment

azdagron commented May 14, 2023 • edited Loading

azdagron commented May 29, 2023

erikgb commented May 29, 2023

erikgb commented May 29, 2023

azdagron commented May 29, 2023

erikgb commented May 29, 2023 • edited Loading

erikgb commented May 29, 2023

azdagron commented May 29, 2023

erikgb commented May 29, 2023

azdagron commented May 29, 2023

empath-nirvana commented Mar 19, 2024

erikgb commented Mar 20, 2024

azdagron commented May 14, 2023 •

edited

Loading

azdagron commented May 14, 2023 •

edited

Loading

erikgb commented May 29, 2023 •

edited

Loading