Hey all,
This is my first time posting, so apologies up front if content or protocol is found lacking.
I recently upgraded (via new install) from 5.4 to Proxmox VE 6.2 and have been trying to configure a VM with Ubuntu 20.04 and GPU passthrough. I was successful in this, and I can use the VM directly via the GPU and a USB passthrough device, so all good there. Pretty much everything about it is functioning well and as expected, for the most part. I was even able to enable nested virtualization for Android emulation.
However, I am running into a (big) problem where stopping, restarting, or shutting down the Ubuntu VM causes the entire server to crash! I'm talking full freeze up, fans to 100%, followed by the horrible motherboard chime that tells you something went terribly wrong. My only guess is that this is related to the GPU, but that is truly just a guess, and I'm not sure how it could be working so well for normal operation and only blowing up when stopping. I haven't been able to find any concrete issues in the logs.
I've included some logs and configurations below, but can anyone point me in the right direction on how to fix or even troubleshoot this issue? I'm no system admin, and I only have a novice understanding of system-level linux and kernel mechanics.
Machine Details
Base: HP Z420
CPU: Xeon E5-1650 (6C/12T @ 3.2GHz)
Memory: 32GB Unregistered ECC DDR3
PVE Install Drive: 120GB Kingston SSD
VM Install Drive: 500GB Crucial SSD
GPU (passthrough): Gigabyte RX 480
GPU (unused): Nvidia Quadro NVS 450
VM Configuration
OS: Ubuntu 20.04
AMD Driver: Default (built-in)
/var/log/syslog
This is the log of simply starting up the server, logging into the web interface, and trying to shut down the VM. So the last successful log you'll see before the crash is the successful login event, then a barf of symbols. What happens after that is when I boot the machine back up.
/etc/modprobe.d/vfio.conf
/etc/modprobe.d/blacklist.conf
This is my first time posting, so apologies up front if content or protocol is found lacking.
I recently upgraded (via new install) from 5.4 to Proxmox VE 6.2 and have been trying to configure a VM with Ubuntu 20.04 and GPU passthrough. I was successful in this, and I can use the VM directly via the GPU and a USB passthrough device, so all good there. Pretty much everything about it is functioning well and as expected, for the most part. I was even able to enable nested virtualization for Android emulation.
However, I am running into a (big) problem where stopping, restarting, or shutting down the Ubuntu VM causes the entire server to crash! I'm talking full freeze up, fans to 100%, followed by the horrible motherboard chime that tells you something went terribly wrong. My only guess is that this is related to the GPU, but that is truly just a guess, and I'm not sure how it could be working so well for normal operation and only blowing up when stopping. I haven't been able to find any concrete issues in the logs.
I've included some logs and configurations below, but can anyone point me in the right direction on how to fix or even troubleshoot this issue? I'm no system admin, and I only have a novice understanding of system-level linux and kernel mechanics.
Machine Details
Base: HP Z420
CPU: Xeon E5-1650 (6C/12T @ 3.2GHz)
Memory: 32GB Unregistered ECC DDR3
PVE Install Drive: 120GB Kingston SSD
VM Install Drive: 500GB Crucial SSD
GPU (passthrough): Gigabyte RX 480
GPU (unused): Nvidia Quadro NVS 450
VM Configuration
OS: Ubuntu 20.04
AMD Driver: Default (built-in)
agent: 1
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
balloon: 4096
bios: ovmf
boot: cdn
bootdisk: scsi0
cores: 8
cpu: host,hidden=1,flags=+pcid
efidisk0: vm_storage:vm-111-disk-1,size=128K
hostpci0: 08:00,pcie=1,x-vga=1
ide2: none,media=cdrom
machine: q35
memory: 16384
name: Ubuntu-20.04
net0: virtio=66:B7:75:4D:B4:77,bridge=vmbr0,firewall=1
numa: 1
onboot: 1
ostype: l26
scsi0: vm_storage:vm-111-disk-0,size=128G
scsihw: virtio-scsi-single
smbios1: uuid=aff5d1c4-8802-4497-8f7d-b8cf61ad7cf8
sockets: 1
usb0: host=046d:c52b
vmgenid: 32dd9ed3-6969-4bdc-a600-6cc051bdbeaf
/var/log/syslog
This is the log of simply starting up the server, logging into the web interface, and trying to shut down the VM. So the last successful log you'll see before the crash is the successful login event, then a barf of symbols. What happens after that is when I boot the machine back up.
Jul 20 17:36:40 pve pve-guests[1792]: <root@pam> end task UPIDve:00000715:00000560:5F160E28:startall::root@pam: OK
Jul 20 17:36:40 pve systemd[1]: Started PVE guests.
Jul 20 17:36:40 pve systemd[1]: Reached target Multi-User System.
Jul 20 17:36:40 pve systemd[1]: Reached target Graphical Interface.
Jul 20 17:36:40 pve systemd[1]: Starting Update UTMP about System Runlevel Changes...
Jul 20 17:36:40 pve systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
Jul 20 17:36:40 pve systemd[1]: Started Update UTMP about System Runlevel Changes.
Jul 20 17:36:40 pve systemd[1]: Startup finished in 23.384s (firmware) + 6.275s (loader) + 3.288s (kernel) + 1min 14.601s (userspace) = 1min 47.550s.
Jul 20 17:36:41 pve systemd-timesyncd[1345]: Timed out waiting for reply from [2001:418:8405:4002::3]:123 (2.debian.pool.ntp.org).
Jul 20 17:36:41 pve systemd-timesyncd[1345]: Synchronized to time server for the first time 64.22.253.155:123 (2.debian.pool.ntp.org).
Jul 20 17:37:00 pve systemd[1]: Starting Proxmox VE replication runner...
Jul 20 17:37:01 pve systemd[1]: pvesr.service: Succeeded.
Jul 20 17:37:01 pve systemd[1]: Started Proxmox VE replication runner.
Jul 20 17:38:00 pve systemd[1]: Starting Proxmox VE replication runner...
Jul 20 17:38:01 pve systemd[1]: pvesr.service: Succeeded.
Jul 20 17:38:01 pve systemd[1]: Started Proxmox VE replication runner.
Jul 20 17:38:27 pve pvedaemon[1690]: <root@pam> successful auth for user 'root@pam'
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Jul 20 17:57:36 pve kernel: [ 0.000000] Linux version 5.4.34-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.34-2 (Thu, 07 May 2020 10:02:02 +0200) ()
Jul 20 17:57:36 pve kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.34-1-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on
Jul 20 17:57:36 pve kernel: [ 0.000000] KERNEL supported cpus:
Jul 20 17:57:36 pve kernel: [ 0.000000] Intel GenuineIntel
Jul 20 17:57:36 pve kernel: [ 0.000000] AMD AuthenticAMD
Jul 20 17:57:36 pve systemd-modules-load[468]: Module 'vfio' is builtin
Jul 20 17:57:36 pve systemd-modules-load[468]: Module 'vfio_iommu_type1' is builtin
Jul 20 17:57:36 pve kernel: [ 0.000000] Hygon HygonGenuine
Jul 20 17:57:36 pve systemd-modules-load[468]: Module 'vfio_pci' is builtin
Jul 20 17:57:36 pve kernel: [ 0.000000] Centaur CentaurHauls
Jul 20 17:57:36 pve systemd-modules-load[468]: Module 'vfio_virqfd' is builtin
Jul 20 17:57:36 pve kernel: [ 0.000000] zhaoxin Shanghai
Jul 20 17:57:36 pve lvm[466]: 1 logical volume(s) in volume group "pve" monitored
Jul 20 17:57:36 pve kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
/etc/modprobe.d/vfio.conf
options vfio-pci ids=1002:67df,1002:aaf0 disable_vga=1
/etc/modprobe.d/blacklist.conf
blacklist radeon
blacklist nouveau
blacklist nvidia
Last edited: