Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable device node creation in CDI mode #927

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

elezar
Copy link
Member

@elezar elezar commented Feb 13, 2025

This change adds a hook to disable device node creation or modification in a container by updating the ModifyDeviceFiles driver parameter.

Without this change running a command like nvidia-smi in a creates the device nodes as follows:

elezar@dgx0126:~/src/container-toolkit$ docker run --rm -ti -e NVIDIA_VISIBLE_DEVICES=runtime.nvidia.com/gpu=0 --runtime=nvidia ubuntu bash -c "ls -al /dev/nvidia*; echo ""; nvidia-smi -L; echo ""; ls -al /dev/nvidia*"                                                                                                                                                                                            
crw-rw-rw- 1 root root 195, 254 Feb 13 14:29 /dev/nvidia-modeset
crw-rw-rw- 1 root root 511,   0 Feb 13 14:29 /dev/nvidia-uvm
crw-rw-rw- 1 root root 511,   1 Feb 13 14:29 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195,   0 Feb 13 14:29 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Feb 13 14:29 /dev/nvidiactl

GPU 0: Tesla V100-SXM2-16GB-N (UUID: GPU-edfee158-11c1-52b8-0517-92f30e7fac88)

crw-rw-rw- 1 root root 195, 254 Feb 13 14:29 /dev/nvidia-modeset
crw-rw-rw- 1 root root 511,   0 Feb 13 14:29 /dev/nvidia-uvm
crw-rw-rw- 1 root root 511,   1 Feb 13 14:29 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195,   0 Feb 13 14:29 /dev/nvidia0
crw-rw-rw- 1 root root 195,   1 Feb 13 14:29 /dev/nvidia1
crw-rw-rw- 1 root root 195,   2 Feb 13 14:29 /dev/nvidia2
crw-rw-rw- 1 root root 195,   3 Feb 13 14:29 /dev/nvidia3
crw-rw-rw- 1 root root 195,   4 Feb 13 14:29 /dev/nvidia4
crw-rw-rw- 1 root root 195,   5 Feb 13 14:29 /dev/nvidia5
crw-rw-rw- 1 root root 195,   6 Feb 13 14:29 /dev/nvidia6
crw-rw-rw- 1 root root 195,   7 Feb 13 14:29 /dev/nvidia7
crw-rw-rw- 1 root root 195, 255 Feb 13 14:29 /dev/nvidiactl

/dev/nvidia-caps:
total 0
drwxr-xr-x 2 root root     80 Feb 13 14:29 .
drwxr-xr-x 7 root root    640 Feb 13 14:29 ..
cr-------- 1 root root 238, 1 Feb 13 14:29 nvidia-cap1
cr--r--r-- 1 root root 238, 2 Feb 13 14:29 nvidia-cap2

With the change applied we see:

elezar@dgx0126:~/src/container-toolkit$ docker run --rm -ti -e NVIDIA_VISIBLE_DEVICES=runtime.nvidia.com/gpu=0 --runtime=nvidia ubuntu bash -c "ls -al /dev/nvidia*; echo ""; nvidia-smi -L; echo ""; ls -al /dev/nvidia*"
crw-rw-rw- 1 root root 195, 254 Feb 13 14:27 /dev/nvidia-modeset
crw-rw-rw- 1 root root 511,   0 Feb 13 14:27 /dev/nvidia-uvm
crw-rw-rw- 1 root root 511,   1 Feb 13 14:27 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195,   0 Feb 13 14:27 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Feb 13 14:27 /dev/nvidiactl

GPU 0: Tesla V100-SXM2-16GB-N (UUID: GPU-edfee158-11c1-52b8-0517-92f30e7fac88)

crw-rw-rw- 1 root root 195, 254 Feb 13 14:27 /dev/nvidia-modeset
crw-rw-rw- 1 root root 511,   0 Feb 13 14:27 /dev/nvidia-uvm
crw-rw-rw- 1 root root 511,   1 Feb 13 14:27 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195,   0 Feb 13 14:27 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Feb 13 14:27 /dev/nvidiactl

/dev/nvidia-caps:
total 0
drwxr-xr-x 2 root root     80 Feb 13 14:27 .
drwxr-xr-x 7 root root    500 Feb 13 14:27 ..
cr-------- 1 root root 238, 1 Feb 13 14:27 nvidia-cap1
cr--r--r-- 1 root root 238, 2 Feb 13 14:27 nvidia-cap2

due to the parameter being updated accordingly:

elezar@dgx0126:~/src/container-toolkit$ docker run --rm -ti -e NVIDIA_VISIBLE_DEVICES=runtime.nvidia.com/gpu=0 --runtime=nvidia ubuntu bash -c "cat /proc/driver/nvidia/params | grep ModifyDeviceFiles"
ModifyDeviceFiles: 0

@elezar elezar force-pushed the disable-device-node-creation branch from 00c5d3b to ba21d0e Compare February 13, 2025 14:33
@elezar elezar marked this pull request as draft February 13, 2025 14:33
@elezar elezar force-pushed the disable-device-node-creation branch from 6b3a5fa to 97c33ef Compare February 13, 2025 22:14
If required, this hook creates a modified params file (with ModifyDeviceFiles: 0) in a tmpfs
and mounts this over /proc/driver/nvidia/params.

This prevents device node creation when running tools such as nvidia-smi.

Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar force-pushed the disable-device-node-creation branch from 97c33ef to 8ca8b00 Compare February 13, 2025 22:37
@elezar elezar force-pushed the disable-device-node-creation branch from 8ca8b00 to a802cc4 Compare February 13, 2025 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant