PVE 9 - NVIDIA Container Toolkit Broken

It is in the no-subscription repo. I've set it up the same way as before, like in that tutorial I linked.

@dasunsrule32 Just to double check, with your approach you don't even need to install the driver (with --no-kernel-modules) in the lxc? That would be great. I wanted to do that somehow before, but didn't know how, so I settled on installing it on both host and lxc.
Nope, no need for driver installation. The drivers from the host get bind mounted to the container from the host for use. You can see I outputted the mount output there.

See my post further up: https://forum.proxmox.com/threads/pve-9-nvidia-container-toolkit-broken.169364/post-797860
 
Last edited:
Nope, no need for driver installation. The drivers from the host get bind mounted to the container from the host for use. You see I outputted the mount output there.

See my post further up: https://forum.proxmox.com/threads/pve-9-nvidia-container-toolkit-broken.169364/post-797860
Fantastic, I'll give that a try in a fresh container.

Btw. do you have a suggestion on how to clean up what was already installed in the existing ones (with --no-kernel-modules)? I don't want to risk moving everything, but I don't want to keep doing this double install when I update drivers either
 
Fantastic, I'll give that a try in a fresh container.

Btw. do you have a suggestion on how to clean up what was already installed in the existing ones (with --no-kernel-modules)? I don't want to risk moving everything, but I don't want to keep doing this double install when I update drivers either
Make a backup and do not use the nvidia hook until you uninstall the manually installed driver.

Run the installer and uninstall the drivers, then you "should" be able to use the bind mound options in the config, delete the card permissions as well. You don't need to handle the card permissions either, as the nvidia hook handles all that for you automatically.

However, my recommendation would be to spin up a new container in case something gets leftover.
 
Fantastic, I'll give that a try in a fresh container.

Btw. do you have a suggestion on how to clean up what was already installed in the existing ones (with --no-kernel-modules)? I don't want to risk moving everything, but I don't want to keep doing this double install when I update drivers either
Also, the hook will only work with unprivileged containers. So you'd need to continue the driver route if are using privileged containers.
 
Also, the hook will only work with unprivileged containers. So you'd need to continue the driver route if are using privileged containers.
Unfortunately I'm using privileged. I guess spinning up a new one is the best way to go. The problem is that I'm running a bunch of docker containers in some of them, so need to be super careful to port everything without loosing something valuable. I'll give it a try
 
Unfortunately I'm using privileged. I guess spinning up a new one is the best way to go. The problem is that I'm running a bunch of docker containers in some of them, so need to be super careful to port everything without loosing something valuable. I'll give it a try
You don't need privileged for Docker. My config above actually will work with Docker as well. Only thing you really need to do is just map permissions to the correct mapped permissions, root for example: 100000:100000 if you're doing bind mounts to your data on the local host.

Example of some data I have:
Code:
drwxr-xr-x - 103568 103568  8 Sep 11:09  apps
drwxr-xr-x - root   adm     2 Sep  2024  logos
drwxrwxr-x - root   adm     8 Sep 13:56  scripts
drwxr-xr-x - 100000 100000 22 Sep 16:07  stacks
 
Last edited:
Yeah, I know. I started with privileged since it was easier to do the passthrough, or at least I thought so. Now I'm stuck with those with a buch of stuff in them. I'll try porting everything to an unprivileged one
Even with the driver, privileged still isn't needed. You just have to set the card permissions on the /dev/nv* stuff in the container config. This is how we all learn though!

You got this!
 
@dasunsrule32 I don't appreciate the full quote but I briefly tested it and quite like it. Not sure why I didn't test it sooner. I'll switch over to that and document it if it works nicely. I'm just not much of a fan of manually editing the CT config. This would also solve the driver/library discrepancy between the node and CTs. Thanks for convincing me to try it!
Edit: It's now documented as well. Let me know if you have suggestions.
 
Last edited:
  • Like
Reactions: dasunsrule32
@dasunsrule32 I tried your method on a new unpriviledged lxc (debian 13). I deployed it with the docker lxc helper script. I can get nvidia-smi to work inside the container. However, I'm having problems with docker. Trying to run ollama, and getting this error:
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: failed to add device rules: unable to find any existing device filters attached to the cgroup: bpf_prog_query(BPF_CGROUP_DEVICE) failed: operation not permitted: unknown

I've checked the devices inside the lxc, and they are all owned by nobody:nogroup.
Bash:
root@host-server:~/docker-services/ollama# ls -al /dev/nvidia*
crw-rw-rw- 1 nobody nogroup 195,   0 Sep 30 14:05 /dev/nvidia0
crw-rw-rw- 1 nobody nogroup 195, 255 Sep 30 14:05 /dev/nvidiactl
crw-rw-rw- 1 nobody nogroup 506,   0 Sep 30 14:06 /dev/nvidia-uvm
crw-rw-rw- 1 nobody nogroup 506,   1 Sep 30 14:06 /dev/nvidia-uvm-tools

Also, compared to my old container, I'm missing these 2 in the new one (although I don't know what they are, or if they are needed):
Bash:
/dev/nvidia-caps:
total 0
cr-------- 1 root root 509, 1 Sep 30 12:06 nvidia-cap1
cr--r--r-- 1 root root 509, 2 Sep 30 12:06 nvidia-cap2

And when I do mount|greap nv, I see a lot of these error=remount-ro
Code:
root@host-server:~/docker-services/ollama# mount|grep nv
tmpfs on /proc/driver/nvidia type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=555,uid=100000,gid=100000,inode64)
tmpfs on /etc/nvidia/nvidia-application-profiles-rc.d type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=555,uid=100000,gid=100000,inode64)
/dev/mapper/pve-root on /usr/bin/nvidia-smi type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/bin/nvidia-debugdump type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/bin/nvidia-persistenced type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/bin/nvidia-cuda-mps-control type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/bin/nvidia-cuda-mps-server type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-gpucomp.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libvdpau_nvidia.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvcuvid.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvoptix.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/firmware/nvidia/580.82.09/gsp_ga10x.bin type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/firmware/nvidia/580.82.09/gsp_tu10x.bin type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
tmpfs on /run/nvidia-persistenced/socket type tmpfs (rw,nosuid,nodev,noexec,relatime,size=3249644k,mode=755,inode64)
udev on /dev/nvidiactl type devtmpfs (ro,nosuid,noexec,relatime,size=16203648k,nr_inodes=4050912,mode=755,inode64)
udev on /dev/nvidia-uvm type devtmpfs (ro,nosuid,noexec,relatime,size=16203648k,nr_inodes=4050912,mode=755,inode64)
udev on /dev/nvidia-uvm-tools type devtmpfs (ro,nosuid,noexec,relatime,size=16203648k,nr_inodes=4050912,mode=755,inode64)
udev on /dev/nvidia0 type devtmpfs (ro,nosuid,noexec,relatime,size=16203648k,nr_inodes=4050912,mode=755,inode64)
proc on /proc/driver/nvidia/gpus/0000:01:00.0 type proc (ro,nosuid,nodev,noexec,relatime)

Do you have any idea? I must have messed up something, although it is a clean install.

The only other thing I did before getting this error is disable apparmor for docker. I had that issue with docker even in the old container after upgrading to pve9. Athough I don't think that is the main problem here. It just wasn't getting to this point with apparmor enabled.
 
Last edited:
@dasunsrule32 I tried your method on a new unpriviledged lxc (debian 13). I deployed it with the docker lxc helper script. I can get nvidia-smi to work inside the container. However, I'm having problems with docker. Trying to run ollama, and getting this error:


I've checked the devices inside the lxc, and they are all owned by nobody:nogroup.
Bash:
root@host-server:~/docker-services/ollama# ls -al /dev/nvidia*
crw-rw-rw- 1 nobody nogroup 195,   0 Sep 30 14:05 /dev/nvidia0
crw-rw-rw- 1 nobody nogroup 195, 255 Sep 30 14:05 /dev/nvidiactl
crw-rw-rw- 1 nobody nogroup 506,   0 Sep 30 14:06 /dev/nvidia-uvm
crw-rw-rw- 1 nobody nogroup 506,   1 Sep 30 14:06 /dev/nvidia-uvm-tools

Also, compared to my old container, I'm missing these 2 in the new one (although I don't know what they are, or if they are needed):
Bash:
/dev/nvidia-caps:
total 0
cr-------- 1 root root 509, 1 Sep 30 12:06 nvidia-cap1
cr--r--r-- 1 root root 509, 2 Sep 30 12:06 nvidia-cap2

And when I do mount|greap nv, I see a lot of these error=remount-ro
Code:
root@host-server:~/docker-services/ollama# mount|grep nv
tmpfs on /proc/driver/nvidia type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=555,uid=100000,gid=100000,inode64)
tmpfs on /etc/nvidia/nvidia-application-profiles-rc.d type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=555,uid=100000,gid=100000,inode64)
/dev/mapper/pve-root on /usr/bin/nvidia-smi type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/bin/nvidia-debugdump type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/bin/nvidia-persistenced type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/bin/nvidia-cuda-mps-control type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/bin/nvidia-cuda-mps-server type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-gpucomp.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libvdpau_nvidia.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvcuvid.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvoptix.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.580.82.09 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/firmware/nvidia/580.82.09/gsp_ga10x.bin type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
/dev/mapper/pve-root on /usr/lib/firmware/nvidia/580.82.09/gsp_tu10x.bin type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro)
tmpfs on /run/nvidia-persistenced/socket type tmpfs (rw,nosuid,nodev,noexec,relatime,size=3249644k,mode=755,inode64)
udev on /dev/nvidiactl type devtmpfs (ro,nosuid,noexec,relatime,size=16203648k,nr_inodes=4050912,mode=755,inode64)
udev on /dev/nvidia-uvm type devtmpfs (ro,nosuid,noexec,relatime,size=16203648k,nr_inodes=4050912,mode=755,inode64)
udev on /dev/nvidia-uvm-tools type devtmpfs (ro,nosuid,noexec,relatime,size=16203648k,nr_inodes=4050912,mode=755,inode64)
udev on /dev/nvidia0 type devtmpfs (ro,nosuid,noexec,relatime,size=16203648k,nr_inodes=4050912,mode=755,inode64)
proc on /proc/driver/nvidia/gpus/0000:01:00.0 type proc (ro,nosuid,nodev,noexec,relatime)

Do you have any idea? I must have messed up something, although it is a clean install.

The only other thing I did before getting this error is disable apparmor for docker. I had that issue with docker even in the old container after upgrading to pve9. Athough I don't think that is the main problem here. It just wasn't getting to this point with apparmor enabled.
Can you post the config of the LXC?
 
Can you post the config of the LXC?
Code:
arch: amd64
cores: 16
features: keyctl=1,nesting=1,fuse=1,mknod=1
hostname: host-server
memory: 32768
net0: name=eth0,bridge=vmbr0,hwaddr=BC:24:11:5F:F0:31,ip=dhcp,tag=2,type=veth
onboot: 1
ostype: debian
rootfs: local-lvm:vm-200-disk-0,size=60G
swap: 512
tags: community-script;docker
unprivileged: 1
lxc.hook.pre-start: sh -c '[ ! -f /dev/nvidia0 ] && /usr/bin/nvidia-modprobe -c0 -u'
lxc.environment: NVIDIA_VISIBLE_DEVICES=all
lxc.environment: NVIDIA_DRIVER_CAPABILITIES=compute,graphics,utility,video
lxc.hook.mount: /usr/share/lxc/hooks/nvidia
 
Code:
arch: amd64
cores: 16
features: keyctl=1,nesting=1,fuse=1,mknod=1
hostname: host-server
memory: 32768
net0: name=eth0,bridge=vmbr0,hwaddr=BC:24:11:5F:F0:31,ip=dhcp,tag=2,type=veth
onboot: 1
ostype: debian
rootfs: local-lvm:vm-200-disk-0,size=60G
swap: 512
tags: community-script;docker
unprivileged: 1
lxc.hook.pre-start: sh -c '[ ! -f /dev/nvidia0 ] && /usr/bin/nvidia-modprobe -c0 -u'
lxc.environment: NVIDIA_VISIBLE_DEVICES=all
lxc.environment: NVIDIA_DRIVER_CAPABILITIES=compute,graphics,utility,video
lxc.hook.mount: /usr/share/lxc/hooks/nvidia
I'm assuming this is docker? If so, you need to install the nvidia container toolkit in the container.
Code:
# nvidia for docker with nvidia-container-toolkit on the PVE host:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && \
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

apt update && apt install nvidia-container-toolkit

# add the following to the containers that will need gpu:
lxc.hook.pre-start: sh -c '[ ! -f /dev/nvidia0 ] && /usr/bin/nvidia-modprobe -c0 -u'
lxc.environment: NVIDIA_VISIBLE_DEVICES=all
lxc.environment: NVIDIA_DRIVER_CAPABILITIES=all
lxc.hook.mount: /usr/share/lxc/hooks/nvidia
Then configure Docker for runtime:
Code:
# configure nvidia in docker
nvidia-ctk runtime configure --runtime=docker
nvidia-ctk config --set nvidia-container-cli.no-cgroups -i
 
Then configure Docker for runtime:
Code:
# configure nvidia in docker
nvidia-ctk runtime configure --runtime=docker
nvidia-ctk config --set nvidia-container-cli.no-cgroups -i

You are a lifesaver, thank you! I forgot to do this part. It's running now.

Btw. is there any reason to worry about these other things that I've mentioned:
  • nobody:nogroup
  • missing nvidia-cap1 and nvidia-cap2
  • errors=remount-ro
 
  • Like
Reactions: dasunsrule32
Is there any way to get a similar approach with privileged containers (so no double driver installation)?

I've tried switching to unprivileged, but this is just a huge mess for me. I don't have a lot of experience with this, and getting e.g. Frigate to work with all of this is a huge pain in the a... I have problems with bind mounts, igpu, coral, using the nvidia for face recognition... Getting the nvidia drivers passed through was the least problem compare to all the other crap. And honestly, I don't care much for security, it's a homelab, nothing exposed to the internet. I just want minimal mainenence, and this double driver installation savings doesn't seem to be worth all the other problems
 
Is there any way to get a similar approach with privileged containers (so no double driver installation)?

I've tried switching to unprivileged, but this is just a huge mess for me. I don't have a lot of experience with this, and getting e.g. Frigate to work with all of this is a huge pain in the a... I have problems with bind mounts, igpu, coral, using the nvidia for face recognition... Getting the nvidia drivers passed through was the least problem compare to all the other crap. And honestly, I don't care much for security, it's a homelab, nothing exposed to the internet. I just want minimal mainenence, and this double driver installation savings don't seem to be worth all the other problems
If you're having issues with this then either deploy a VM with Docker in it or look for alternative install methods other than Docker. I can't help with this as I don't use Frigate, I'd take that to their forums/communities for help. Good luck!
 
If you're having issues with this then either deploy a VM with Docker in it or look for alternative install methods other than Docker. I can't help with this as I don't use Frigate, I'd take that to their forums/communities for help. Good luck!
I think I'll just stick with the privileged container. Apart from that double install being annoying, it worked perfectly. Switching to unpriiledged is just not worth the mess for me