Can't access VM with GPU Pasthrough

qwertyu

New Member
Dec 4, 2023
1
0
1
Hi, we have a virtual machine using a GPU Passthrough for a Debian OS with graphical environment. The strange thing is that it was working until now, but for some reason the machine shut down without human intervention and now it is impossible to start it again.

Code:
Dec  4 17:50:20 pve-4 qm[711524]: VM 218 qmp command failed - VM 218 qmp command 'set_password' failed - unable to connect to VM 218 qmp socket - timeout after 51 retries
Dec  4 17:50:26 pve-4 pvestatd[3365]: VM 218 qmp command failed - VM 218 qmp command 'query-proxmox-support' failed - unable to connect to VM 218 qmp socket - timeout after 51 retries
Dec  4 17:50:28 pve-4 pvedaemon[698510]: VM 218 qmp command failed - VM 218 qmp command 'guest-ping' failed - got timeout
Dec  4 17:50:30 pve-4 pvedaemon[661917]: VM 218 qmp command failed - VM 218 qmp command 'query-proxmox-support' failed - unable to connect to VM 218 qmp socket - timeout after 51 retries

This is the output of pveversion -v:
Bash:
proxmox-ve: 7.2-1 (running kernel: 5.15.107-2-pve)
pve-manager: 7.4-14 (running version: 7.4-14/81b856fa)
pve-kernel-5.15: 7.4-3
pve-kernel-helper: 7.2-13
pve-kernel-5.13: 7.1-9
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.64-1-pve: 5.15.64-1
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.13.19-1-pve: 5.13.19-3
ceph: 16.2.13-pve1
ceph-fuse: 16.2.13-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.2-1
proxmox-backup-file-restore: 2.4.2-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-5
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-4
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1

And this is the config of the vm:
Code:
agent: 1
boot: c
bootdisk: virtio0
cipassword: x
ciuser: x
cores: 8
cpu: host
hostpci0: 0000:2b:00.0
hotplug: disk,network,usb
ide0: none,media=cdrom
ipconfig0: ip=172.17.28.84/24,gw=172.17.28.1
kvm: 1
machine: q35
memory: 65536
meta: creation-qemu=6.1.0,ctime=1641989810
name: eh-node-5
net0: virtio=42:79:42:48:A0:E4,bridge=vmbr1028
numa: 1
ostype: l26
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=fbe11d96-04a0-4d4a-b49f-26472445dcf8
sockets: 2
unused0: saslocaldata:vm-218-cloudinit
unused1: saslocaldata:vm-218-disk-0
vga: virtio
virtio0: ssdlocaldata:vm-218-disk-0,size=200G
virtio1: local-lvm:vm-218-disk-0,iothread=1,size=30G
vmgenid: x

Is there a quick fix without having to reboot the physical server?
 
Is there a quick fix without having to reboot the physical server?
If there are no actual error messages in journalctl from around the time of starting the VM, it could be a timeout because not enough continuous free memory is available (and due to the passthrough, all memory must be pinned into host memory). Try starting it with 4GB, just to test this. This is usually (temporarily) fixed by rebooting the host.