[SOLVED] Can't get Nvidia GPU passthrough to work on LXC

That only removes packages starting with nvidia though. Something is very wrong here. You have packages for 550, DKMS has a driver for 595 and kernel 7.0.2-6-pve. Your current kernel is 7.0.12-1-pve. Things don't match up at all.
Please try
Bash:
apt autopurge "*nvidia*"
./$(ls -t NVIDIA*.run | head -n 1) --dkms --uninstall
Then install the current .run file again as per my guide. Show me all the outputs of the process.
This might also be needed after install
Bash:
dpkg-reconfigure nvidia-kernel-dkms
# Likely not needed
dkms autoinstall
Reboot, then share the last commands again.
 
Last edited:
i did the commands to remove the nvidia stuff. then i reinstalled the driver using the link you have previously sent, then i rebooted without using the 2nd batch of commands.

now i get:

Bash:
# lspci -vnnk | awk '/VGA/{print $0}' RS=
nvidia-smi
pct config 103
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA104 [GeForce RTX 3060 Ti] [10de:2486] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: PC Partner Limited / Sapphire Technology Device [174b:a632]
        Flags: bus master, fast devsel, latency 0, IRQ 107, IOMMU group 12
        Memory at f5000000 (32-bit, non-prefetchable) [size=16M]
        Memory at fa00000000 (64-bit, prefetchable) [size=8G]
        Memory at fc00000000 (64-bit, prefetchable) [size=32M]
        I/O ports at f000 [size=128]
        Expansion ROM at f6000000 [virtual] [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Legacy Endpoint, IntMsgNum 0
        Capabilities: [b4] Vendor Specific Information: Len=14 <?>
        Capabilities: [100] Virtual Channel
        Capabilities: [250] Latency Tolerance Reporting
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] Secondary PCI Express
        Capabilities: [bb0] Physical Resizable BAR
        Capabilities: [c1c] Physical Layer 16.0 GT/s <?>
        Capabilities: [d00] Lane Margining at the Receiver
        Capabilities: [e00] Data Link Feature <?>
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nova_core, nvidia_drm, nvidia
07:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Raphael [1002:164e] (rev c4) (prog-if 00 [VGA controller])
        Subsystem: ASUSTeK Computer Inc. Device [1043:8877]
        Flags: bus master, fast devsel, latency 0, IRQ 81, IOMMU group 15
        Memory at fc10000000 (64-bit, prefetchable) [size=256M]
        Memory at fc20000000 (64-bit, prefetchable) [size=2M]
        I/O ports at d000 [size=256]
        Memory at f6500000 (32-bit, non-prefetchable) [size=512K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [64] Express Legacy Endpoint, IntMsgNum 0
        Capabilities: [a0] MSI: Enable- Count=1/4 Maskable- 64bit+
        Capabilities: [c0] MSI-X: Enable+ Count=4 Masked-
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [270] Secondary PCI Express
        Capabilities: [2a0] Access Control Services
        Capabilities: [2b0] Address Translation Service (ATS)
        Capabilities: [2c0] Page Request Interface (PRI)
        Capabilities: [2d0] Process Address Space ID (PASID)
        Capabilities: [410] Physical Layer 16.0 GT/s <?>
        Capabilities: [450] Lane Margining at the Receiver
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu
Mon Jun 22 15:47:11 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.84                 Driver Version: 595.84         CUDA Version: 13.2     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060 Ti     On  |   00000000:01:00.0 Off |                  N/A |
| 30%   46C    P8              7W /  200W |       1MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
arch: amd64
cores: 6
features: nesting=1
hostname: media
memory: 4096
mp0: /mnt/usbstick,mp=/mnt/usbstick
net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.2.254,hwaddr=BC:24:11:33:11:54,ip=192.168.2.207/24,type=veth
onboot: 1
ostype: debian
rootfs: media:103/vm-103-disk-0.raw,size=3500G
swap: 2048
unprivileged: 1
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file
lxc.idmap: u 0 100000 65536
lxc.idmap: g 0 100000 44
lxc.idmap: g 44 44 1
lxc.idmap: g 45 100045 62
lxc.idmap: g 107 993 1
lxc.idmap: g 108 100108 65428

its starting to look alot better isn't it? :)
 
i can't start the container anymore:
Bash:
run_buffer: 569 Script exited with status 255
lxc_init: 1037 Failed to run lxc.hook.pre-start for container "103"
__lxc_start: 2208 Failed to initialize container "103"
TASK ERROR: startup for container '103' failed
 
Bash:
# pct start 103 --debug
run_buffer: 569 Script exited with status 255
lxc_init: 1037 Failed to run lxc.hook.pre-start for container "103"
__lxc_start: 2208 Failed to initialize container "103"
hostid 100000 range 44
INFO     confile - ../src/lxc/confile.c:set_config_idmaps:2297 - Read uid map: type g nsid 44 hostid 44 range 1
INFO     confile - ../src/lxc/confile.c:set_config_idmaps:2297 - Read uid map: type g nsid 45 hostid 100045 range 62
INFO     confile - ../src/lxc/confile.c:set_config_idmaps:2297 - Read uid map: type g nsid 107 hostid 993 range 1
INFO     confile - ../src/lxc/confile.c:set_config_idmaps:2297 - Read uid map: type g nsid 108 hostid 100108 range 65428
INFO     lsm - ../src/lxc/lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver AppArmor
INFO     utils - ../src/lxc/utils.c:run_script_argv:585 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "103", config section "lxc"
DEBUG    utils - ../src/lxc/utils.c:run_buffer:558 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook '103' 'lxc' 'pre-start' produced output: losetup: /mnt/pve/media/images/103/vm-103-disk-0.raw: Warning: file does not end on a 512-byte sector boundary; the remaining end of the file will be ignored.

DEBUG    utils - ../src/lxc/utils.c:run_buffer:558 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook '103' 'lxc' 'pre-start' produced output: mount: /var/lib/lxc/.pve-staged-mounts/rootfs: can't read superblock on /dev/loop2.
dmesg(1) may have more information after failed mount system call.

DEBUG    utils - ../src/lxc/utils.c:run_buffer:558 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook '103' 'lxc' 'pre-start' produced output: command 'mount /dev/loop2 /var/lib/lxc/.pve-staged-mounts/rootfs' failed: exit code 32

ERROR    utils - ../src/lxc/utils.c:run_buffer:569 - Script exited with status 255
ERROR    start - ../src/lxc/start.c:lxc_init:1037 - Failed to run lxc.hook.pre-start for container "103"
ERROR    start - ../src/lxc/start.c:__lxc_start:2208 - Failed to initialize container "103"
INFO     utils - ../src/lxc/utils.c:run_script_argv:585 - Executing script "/usr/share/lxcfs/lxc.reboot.hook" for container "103", config section "lxc"
startup for container '103' failed

i checked, the GPU names/IDs didnt switch around.
 
Hmm. Share
Bash:
pct fsck 103
pct config 103
Not sure what happened here. Try with a minimal config or restore a backup and see if it works then.
 
Bash:
# pct fsck 103
pct config 103
fsck from util-linux 2.41
/mnt/pve/media/images/103/vm-103-disk-0.raw:
The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

fsck.ext4: Superblock checksum does not match superblock while trying to open /mnt/pve/media/images/103/vm-103-disk-0.raw
command 'fsck -a -l /mnt/pve/media/images/103/vm-103-disk-0.raw' failed: exit code 8
arch: amd64
cores: 6
features: nesting=1
hostname: media
memory: 4096
mp0: /mnt/usbstick,mp=/mnt/usbstick
net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.2.254,hwaddr=BC:24:11:33:11:54,ip=192.168.2.207/24,type=veth
onboot: 1
ostype: debian
rootfs: media:103/vm-103-disk-0.raw,size=3500G
swap: 2048
unprivileged: 1
lxc.hook.pre-start: sh -c '[ ! -f /dev/nvidia0 ] && /usr/bin/nvidia-modprobe -c0 -u'
lxc.environment: NVIDIA_VISIBLE_DEVICES=all
lxc.environment: NVIDIA_DRIVER_CAPABILITIES=all
lxc.hook.mount: /usr/share/lxc/hooks/nvidia
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file
lxc.idmap: u 0 100000 65536
lxc.idmap: g 0 100000 44
lxc.idmap: g 44 44 1
lxc.idmap: g 45 100045 62
lxc.idmap: g 107 993 1
lxc.idmap: g 108 100108 65428

i remember adding
Bash:
lxc.idmap: u 0 100000 65536

lxc.idmap: g 0 100000 44

lxc.idmap: g 44 44 1

lxc.idmap: g 45 100045 62

lxc.idmap: g 107 993 1

lxc.idmap: g 108 100108 65428

with the help of a YT tutorial before i came to this forum. now idk how i did it or how to remove it again :( im not even sure which part it was exactly that the video told me to add, just that the part above was part of it
 
Something seems to be wrong with your disk. Try to restore a backup or test with a new CT. You also need to remove the lxc.cgroup and lxc.mount lines as mentioned earlier. Just use nano to edit the config file.
 
Last edited:
this happened after #12 as i mentioned in #14 btw. also there is no backup, bc it wouldnt let me backup bc of "insufficient space" in a temp folder or something. and where do i find that config file again?
 
thanks, i kept trying the .raw file
anyhow i removed all the stuff im 100% certain i added earlier and it now looks like this:
Bash:
lxc.hook.pre-start: sh -c '[ ! -f /dev/nvidia0 ] && /usr/bin/nvidia-modprobe -c0 -u'
lxc.environment: NVIDIA_VISIBLE_DEVICES=all
lxc.environment: NVIDIA_DRIVER_CAPABILITIES=all
lxc.hook.mount: /usr/share/lxc/hooks/nvidia
arch: amd64
cores: 6
features: nesting=1
hostname: media
memory: 4096
mp0: /mnt/usbstick,mp=/mnt/usbstick
net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.2.254,hwaddr=BC:24:11:33:11:54,ip=192.168.2.207/24,type=veth
onboot: 1
ostype: debian
rootfs: media:103/vm-103-disk-0.raw,size=3500G
swap: 2048
unprivileged: 1
 
Looks okay. Not sure what happened to the disk. What's pct start 103 --debug say now?
 
Bash:
# pct start 103 --debug
run_buffer: 569 Script exited with status 255
lxc_init: 1037 Failed to run lxc.hook.pre-start for container "103"
__lxc_start: 2208 Failed to initialize container "103"
hostid 100000 range 65536
INFO     lsm - ../src/lxc/lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver AppArmor
INFO     utils - ../src/lxc/utils.c:run_script_argv:585 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "103", config section "lxc"
DEBUG    utils - ../src/lxc/utils.c:run_buffer:558 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook '103' 'lxc' 'pre-start' produced output: losetup: /mnt/pve/media/images/103/vm-103-disk-0.raw: Warning: file does not end on a 512-byte sector boundary; the remaining end of the file will be ignored.

DEBUG    utils - ../src/lxc/utils.c:run_buffer:558 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook '103' 'lxc' 'pre-start' produced output: mount: /var/lib/lxc/.pve-staged-mounts/rootfs: cannot mount; probably corrupted filesystem on /dev/loop2.
dmesg(1) may have more information after failed mount system call.

DEBUG    utils - ../src/lxc/utils.c:run_buffer:558 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook '103' 'lxc' 'pre-start' produced output: command 'mount /dev/loop2 /var/lib/lxc/.pve-staged-mounts/rootfs' failed: exit code 32

ERROR    utils - ../src/lxc/utils.c:run_buffer:569 - Script exited with status 255
ERROR    start - ../src/lxc/start.c:lxc_init:1037 - Failed to run lxc.hook.pre-start for container "103"
ERROR    start - ../src/lxc/start.c:__lxc_start:2208 - Failed to initialize container "103"
INFO     utils - ../src/lxc/utils.c:run_script_argv:585 - Executing script "/usr/share/lxcfs/lxc.reboot.hook" for container "103", config section "lxc"
startup for container '103' failed
 
I'm not sure how to fix that disk. I'm also not sure why what I suggested would have broken it. Let's focus on the GPU for now. The data apparently wasn't inportant enough. Try with a fresh CT.
 
Last edited:
the container stopped working completly. i tried to get some important data off there so i could at least use that for a new container but its gone. i think it corrupted when i tried to shrink its drive... anyhow i deleted it and made a new one that boots and runs about the same programs as before :3
 
You don't need to and shouldn't install the NVIDIA driver/library inside the CT with the toolkit way.
 
Last edited: