GPU passthrough prevents reboot of ubuntu VM

rakali

Active Member
Jan 2, 2020
42
5
28
42
I was using Windows VM for playback of youtube and plex, but there were performance issues with dropped frames and out of sync audio. As a test, I switched to Ubuntu 19.10 and performance is perfect. I was able to reboot the windows VM without issue.

With ubuntu, after a cold start of the machine, I can start the VM. Reboot and any subsequent stops and starts result in lockup and timeout.

The VM appears to be half started, there is pegged cpu activity, but it is non-responsive. No network, no response with qemu-agent. The only thing to do is restart the whole machine.

AMD RX580 GPU is PCI passthrough connected to a 1080@60 monitor via hdmi.


any advice on tweaks that might improve the experience?

When I run qm showcmd 100 | bash there is nothing returned.

syslog has these lines
Code:
Jan 26 07:47:31 proxmox pvedaemon[2559]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - unable to connect to VM 100 qga socket - timeout after 31 retries
Jan 26 07:47:35 proxmox ssh[3329]: debug3: send packet: type 80
Jan 26 07:47:35 proxmox ssh[3329]: debug3: receive packet: type 82
Jan 26 07:47:37 proxmox kernel: vfio-pci 0000:01:00.0: not ready 65535ms after FLR; giving up
Jan 26 07:47:37 proxmox kernel: vfio-pci 0000:01:00.0: vfio_cap_init: hiding cap 0xff@0xff
Jan 26 07:47:37 proxmox kernel: vfio-pci 0000:01:00.0: vfio_cap_init: hiding cap 0xff@0xff
Jan 26 07:47:37 proxmox kernel: vfio-pci 0000:01:00.0: vfio_cap_init: hiding cap 0xff@0xff
Jan 26 07:47:37 proxmox kernel: vfio-pci 0000:01:00.0: vfio_cap_init: hiding cap 0xff@0xff
Jan 26 07:47:37 proxmox kernel: vfio-pci 0000:01:00.0: vfio_cap_init: hiding cap 0xff@0xff
Jan 26 07:47:37 proxmox kernel: vfio-pci 0000:01:00.0: vfio_cap_init: hiding cap 0xff@0xff
Jan 26 07:47:37 proxmox kernel: vfio-pci 0000:01:00.0: vfio_ecap_init: hiding ecap 0xffff@0x100
Jan 26 07:47:37 proxmox kernel: vfio-pci 0000:01:00.0: vfio_ecap_init: hiding ecap 0xffff@0xffc
Jan 26 07:47:37 proxmox kernel: vfio-pci 0000:01:00.0: vfio_ecap_init: hiding ecap 0xffff@0xffc
Jan 26 07:47:37 proxmox kernel: vfio-pci 0000:01:00.0: vfio_ecap_init: hiding ecap 0xffff@0xffc
Jan 26 07:47:37 proxmox kernel: vfio-pci 0000:01:00.1: Refused to change power state, currently in D3
Jan 26 07:47:37 proxmox kernel: vfio-pci 0000:01:00.1: Refused to change power state, currently in D3
Jan 26 07:47:37 proxmox kernel: vfio-pci 0000:01:00.1: vfio_cap_init: hiding cap 0xff@0xff
Jan 26 07:47:37 proxmox kernel: vfio-pci 0000:01:00.1: vfio_cap_init: hiding cap 0xff@0xff
Jan 26 07:47:37 proxmox kernel: vfio-pci 0000:01:00.1: vfio_ecap_init: hiding ecap 0xffff@0x100
Jan 26 07:47:37 proxmox kernel: vfio-pci 0000:01:00.1: vfio_ecap_init: hiding ecap 0xffff@0xffc

UBUNTU VM:

I have qemu-agent package installed but when I run qm agent 100 ping there is no response. Does that mean it is working? from qm agent 100 network-get-interfaces I get the correct information, so I guess it is fine.

Code:
# cat /etc/pve/qemu-server/100.conf
agent: 1
balloon: 0
bios: ovmf
boot: c
bootdisk: scsi0
cores: 4
cpu: host
efidisk0: local-zfs:vm-100-disk-0,size=1M
hostpci0: 01:00,pcie=1,x-vga=1
machine: q35
memory: 4096
name: htpc
net0: virtio=,bridge=vmbr1
numa: 1
onboot: 1
ostype: l26
scsi0: AGGRETSUKO:vm-100-disk-0,cache=unsafe,discard=on,replicate=0,size=200G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=
sockets: 1
tablet: 0
vga: none
vmgenid:

I have tried with this hookscript, to no effect.

Code:
#!/usr/bin/env bash

if [ "$2" == "pre-start" ]
then
# First release devices from their current driver (by their PCI bus IDs)
echo 0000:01:00.0 > /sys/bus/pci/devices/0000:01:00.0/driver/unbind
# Then attach them by ID to VFIO
echo 1002 67df > /sys/bus/pci/drivers/vfio-pci/new_id
fi

Code:
# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.13-1-pve)
pve-manager: 6.1-5 (running version: 6.1-5/9bf06119)
pve-kernel-5.3: 6.1-1
pve-kernel-helper: 6.1-1
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.21-2-pve: 5.0.21-7
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-10
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-2
pve-cluster: 6.1-3
pve-container: 3.0-16
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-4
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2
 
Last edited:
I changed the machine to i440fx and removed the pcie=on and everything works fine, so this is something to do with q35.

there are plenty of threads talking about changes in proxmox 6/qemu4 and that adding args kernel_irqchip=on and machine pc-q35-3.1 helped. in my case it did not.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!