I rebooted my server this morning (2 LXCs, one VM), and the VM will not come up with the passthrough GPU for PCIe. For this case, the VM ID is 104. I turned off "start on boot" so that I could actually get to a "stable" state, I've made sure I'm updated `apt update` / `apt upgrade`, rebuilt nVidia drivers on both host and in the VM by disabling passthrough. Nothing works, and nothing has changed with the config afaik (last rebooted about a week ago).
What the heck is happening??
I'm able to view the card from `nvidia-smi` on the host, so I know it "works":
Any interactions from the command line result in it locking up, i.e. if I run `qm start 104` it can't be interupted/slept/etc.. Trying to kill the running process does nothing as well:
Logs:
* `dmesg -T`: https://paste.debian.net/hidden/629d3d58/
What the heck is happening??
I'm able to view the card from `nvidia-smi` on the host, so I know it "works":
Code:
root@proxmox:~# nvidia-smi
Fri May 9 11:49:13 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.144 Driver Version: 570.144 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GTX 1660 ... Off | 00000000:27:00.0 Off | N/A |
| 0% 44C P8 12W / 125W | 0MiB / 6144MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Any interactions from the command line result in it locking up, i.e. if I run `qm start 104` it can't be interupted/slept/etc.. Trying to kill the running process does nothing as well:
Code:
root@proxmox:~# lsof /var/lock/qemu-server/lock-104.conf
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
task\x20U 4055 root 5wW REG 0,28 0 77 /run/lock/qemu-server/lock-104.conf
root@proxmox:~# ps aux | grep 4055
root 4055 27.3 0.3 229756 118796 pts/0 R+ 11:39 3:06 task UPID:proxmox:00000FD7:00001001:681E4BD0:qmstart:104:root@pam:
root 7950 0.0 0.0 6336 2048 pts/1 S+ 11:50 0:00 grep 4055
root@proxmox:~# pstree 4055
task UPID:proxm
root@proxmox:~#
Code:
root@proxmox:~# cat /etc/pve/qemu-server/104.conf
[...]
agent: enabled=1
args: -object memory-backend-memfd,id=mem,size=8192M,share=on
bios: ovmf
boot: order=scsi0
cores: 6
cpu: EPYC-IBPB
efidisk0: local-lvm:vm-104-disk-0,efitype=4m,size=4M
hostpci0: 0000:27:00.0
localtime: 1
memory: 16384
meta: creation-qemu=9.2.0,ctime=1745094291
name: docker
net0: virtio=02:FF:E6:52:C1:29,bridge=vmbr0
numa: 1
onboot: 0
ostype: l26
scsi0: local-lvm:vm-104-disk-1,discard=on,size=200G,ssd=1
scsi1: local-lvm:vm-104-disk-2,backup=0,cache=writethrough,size=256G
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=5bbe3e4d-cebe-4269-a2aa-e4fb2a2acb64
sockets: 2
tablet: 0
tags: community-script,debian12,docker
usb0: host=8-3
vga: none
vmgenid: ecf2b3c6-4c7a-4c48-9f15-97da478ac861
Logs:
* `dmesg -T`: https://paste.debian.net/hidden/629d3d58/
Last edited: