PVE 9 - NVIDIA Container Toolkit Broken

rausche

New Member
Aug 7, 2025
3
0
1
Following the upgrade from PVE8 to PVE9, I can no longer start my LXCs which are utilizing the NVIDIA Container Toolkit.

I have a variety of containers that attach to the same GPU for things like transcoding, rendering, stable diffusion, LLMs, etc -- so passthrough\dedicated assignemnt is a nuisance, and the container toolkit provides an elegant method to share the GPU between containers seamless and with low administrative overhead.

Some details:
  • was previously on the optional 6.14 kernel on PVE8, no issues there
  • using NVIDIA Driver 570.181, upgraded from 570.172.08 during troubleshooting
  • using NVIDIA Container Toolkit 1.18.0~rc.2, upgraded from 1.17.8 during troubleshooting
I utilize the following lines in LXC config to hook the toolkit:
lxc.hook.pre-start: sh -c '[ ! -f /dev/nvidia-uvm ] && /usr/bin/nvidia-modprobe -c0 -u'
lxc.environment: NVIDIA_VISIBLE_DEVICES=all
lxc.environment: NVIDIA_DRIVER_CAPABILITIES=compute,utility,video
lxc.hook.mount: /usr/share/lxc/hooks/nvidia

In my LXC debug logs during container startup I see entries like the following:
  • DEBUG utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/nvidia 117 lxc mount produced output: + exec nvidia-container-cli --user configure --no-cgroups --ldconfig=@/usr/sbin/ldconfig --device=all --compute --utility --video /usr/lib/x86_64-linux-gnu/lxc/rootfs
  • DEBUG utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/nvidia 117 lxc mount produced output: nvidia-container-cli: mount error: open failed: /usr/lib/x86_64-linux-gnu/lxc/rootfs/proc/1/ns/mnt: permission denied
I notice that the hook script is calling '--no-cgroups' , since this is an unprivileged container, and this seems to be the cause of my problems. However, since the NVIDIA Container Toolkit apparently supports cgroups for a long time, I attempted modifying the hook script to remove that argument -- which causes the following debug line to appear:
  • DEBUG utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/nvidia 117 lxc mount produced output: + exec nvidia-container-cli --user configure --ldconfig=@/usr/sbin/ldconfig --device=all --compute --utility --video /usr/lib/x86_64-linux-gnu/lxc/rootfs
  • DEBUG utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/nvidia 117 lxc mount produced output: nvidia-container-cli: container error: failed to get device cgroup mount path: relative path in mount prefix: /../../..
During troubleshooting I also tried running the container as a priviledged container, however the debug log reports back that the '/usr/share/lxc/hooks/nvidia' hook can only be run in the unpriviledged context.

I am at something of a loss here, and hoping not to have to revert back to PVE8. Is there a convienent way to embed appropriate cgroup2 support into the LXC to enable usability with the NVIDIA Container Toolkit?
 
Same issue:

Code:
lxc-start <contnum> 20250809114632.390 DEBUG    cgfsng - ../src/lxc/cgroups/cgfsng.c:__cgroupfs_mount:2197 - Mounted cgroup filesystem cgroup2 onto 19((null))
lxc-start <contnum> 20250809114632.390 INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxcfs/lxc.mount.hook" for container "<contnum>", config section "lxc"
lxc-start <contnum> 20250809114632.411 INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/hooks/nvidia" for container "<contnum>", config section "lxc"
lxc-start <contnum> 20250809114632.424 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/nvidia <contnum> lxc mount produced output: mkdir: cannot create directory ‘/var/lib/lxc/<contnum>/hook’
lxc-start <contnum> 20250809114632.424 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/nvidia <contnum> lxc mount produced output: : Permission denied
lxc-start <contnum> 20250809114632.428 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/nvidia <contnum> lxc mount produced output: + exec nvidia-container-cli --user configure --no-cgroups --ldconfig=@/usr/sbin/ldconfig --device=all --compute --compat32 --display --graphics --utility --video /usr/lib/x86_64-linux-gnu/lxc/rootfs
lxc-start <contnum> 20250809114632.448 DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/nvidia <contnum> lxc mount produced output: nvidia-container-cli: mount error: open failed: /usr/lib/x86_64-linux-gnu/lxc/rootfs/proc/1/ns/mnt: permission denied

Might be more of NVIDIA Container Toolkit issue (compatibility with PVE 9 LXC version or so ?) - but useful for other Proxmox users to know they should not update to PVE9 if they rely on nvidia-container-toolkit for now.
 
Might be more of NVIDIA Container Toolkit issue (compatibility with PVE 9 LXC version or so ?)
PVE 9 moves from LXC 6.0.0 to LXC 6.0.4, which released in April 2025 -- so I think NVIDIA has had time to build in support for this LXC version and/or for reports to be made around compatability issues... but I don't find any similar issues when researching these debug log entries, other than this issue which is somewhat similar. However the workaround for that issue is no longer valid for PVE 9.

I think PVE 9's implementation of LXC, with the full removal of cgroupv1 mounts, is now 'botched' for the container toolkit because:
  1. container toolkit hook script is somehow deciding to launch with --no-cgroups flag; not sure what the logic is for this in the script
  2. with --no-cgroups flag, the hook script (and container itself?) is not allowed to interact the /proc fs in rw before pivot_root occurs
  3. without --no-cgroups flag, the hook script cannot determine the path to the temporary /proc fs, since the container is passing a relative path instead of an absolute path
Additionally using native LXC arguments like lxc.namespace.clone=proc sys mnt or lxc.mount.auto=proc:rw sys:rw cgroup-full:rw either doesn't work or appears to break other PVE-related components required to launch the container successfully.

But, all said, I am fairly new to debugging LXC here and I'm somewhat grasping at straws. I feel as though the answer is probably somewhere in the LXC container documentation, but I'm not actually sure this is resolvable without modifying PVE's own container launching logic and parameters.
 
The issue might be at least partly related to apparmor. It looks like migration to Debian trixie implied migration to apparmor 4, which in turn brought new defaults and configuration files changes that Proxmox has some trouble fully catching up with (see for instance bug "apparmor problem mqueue in LXC", forum thread "Proxmox VE 9.0 BETA LCX Docker not working").

Here are the system logs for apparmor when trying to start the container with default apparmor configuration and with nvidia containers toolkit hooks. I must emphasize that container is unprivileged.

Code:
[X.677367] audit: type=1400 audit(X.806:319): apparmor="DENIED" operation="getattr" class="posix_mqueue" profile="/usr/bin/lxc-start" name="/" pid=1656100 comm="vgs" requested="getattr" denied="getattr"class="posix_mqueue" fsuid=0 ouid=0
[X.719834] audit: type=1400 audit(X.849:320): apparmor="DENIED" operation="getattr" class="posix_mqueue" profile="/usr/bin/lxc-start" name="/" pid=1656101 comm="lvs" requested="getattr" denied="getattr"class="posix_mqueue" fsuid=0 ouid=0
[X.932507] audit: type=1400 audit(X.061:321): apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-108_</var/lib/lxc>" pid=1656118 comm="apparmor_parser"
[X.530805] audit: type=1400 audit(X.660:322): apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-108_</var/lib/lxc>" pid=1656282 comm="apparmor_parser"

I tried to implement workaround about mqueue as advised in bug "apparmor problem mqueue in LXC", but this did not change anything, and is likely useless as the patch has been implemented and distributed already, also this might apply to privileged containers only.

I swear I could have the container starting correctly (but with nvidia hooks not necessarily working though, I did not check) when I changed the container config to COMPLETELY DISABLE apparmor, doing lxc.apparmor.profile = unconfined (as described in Proxmox wiki "Linux_Container" page). However this is not reliably reproductible, so on top of apparmor issue, there might be race conditions or conflicts with what nvidia hooks script is actually doing.
 
The issue might be at least partly related to apparmor. It looks like migration to Debian trixie implied migration to apparmor 4, which in turn brought new defaults and configuration files changes that Proxmox has some trouble fully catching up with (see for instance bug "apparmor problem mqueue in LXC", forum thread "Proxmox VE 9.0 BETA LCX Docker not working").

Here are the system logs for apparmor when trying to start the container with default apparmor configuration and with nvidia containers toolkit hooks. I must emphasize that container is unprivileged.

Code:
[X.677367] audit: type=1400 audit(X.806:319): apparmor="DENIED" operation="getattr" class="posix_mqueue" profile="/usr/bin/lxc-start" name="/" pid=1656100 comm="vgs" requested="getattr" denied="getattr"class="posix_mqueue" fsuid=0 ouid=0
[X.719834] audit: type=1400 audit(X.849:320): apparmor="DENIED" operation="getattr" class="posix_mqueue" profile="/usr/bin/lxc-start" name="/" pid=1656101 comm="lvs" requested="getattr" denied="getattr"class="posix_mqueue" fsuid=0 ouid=0
[X.932507] audit: type=1400 audit(X.061:321): apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-108_</var/lib/lxc>" pid=1656118 comm="apparmor_parser"
[X.530805] audit: type=1400 audit(X.660:322): apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-108_</var/lib/lxc>" pid=1656282 comm="apparmor_parser"

I tried to implement workaround about mqueue as advised in bug "apparmor problem mqueue in LXC", but this did not change anything, and is likely useless as the patch has been implemented and distributed already, also this might apply to privileged containers only.

I swear I could have the container starting correctly (but with nvidia hooks not necessarily working though, I did not check) when I changed the container config to COMPLETELY DISABLE apparmor, doing lxc.apparmor.profile = unconfined (as described in Proxmox wiki "Linux_Container" page). However this is not reliably reproductible, so on top of apparmor issue, there might be race conditions or conflicts with what nvidia hooks script is actually doing.
On that beta thread, this stuck out to me. I wonder if updating to a trixie lxc image would do the trick?
FYI, I can see that in a Debian bookworm based container, but it seems alright in a Debian Trixie based one, at least an ls /dev/mqueue works there.
 
I was working with the Incus project owner on their forums and it looks like this is a regression in 6.0.4 which was fixed in 6.0.5.

I'm going to open a bug report with Proxmox.

Thread: https://discuss.linuxcontainers.org/t/lxc-nvidia-container-toolkit/24563/4?u=dasunsrule32
6.0.5 release thread: https://discuss.linuxcontainers.org/t/lxc-6-0-5-lts-has-been-released/24438
This does sound promising.

Keep us posted. I'll be happy to help test any resolution or workaround offered.