Trouble with graphics acceleration after reinstalling onto pve8

FireStormOOO

New Member
Dec 15, 2023
2
0
1
I recently rebuilt my Proxmox host onto PVE 8 after an issue with an update broke my boot and/or corrupted my root. Related to said problems, I chose to install from a Debian 12 ISO and install the Proxmox packages only after configuring my boot and ZFS root. GRUB does not play nice with encrypted root on ZFS, using ZFSBootMenu instead for simplicity and easier root snapshots. I completed the install without issue, recovered my old configuration, and almost everything is working again with the exception of graphics acceleration and spice remoting. I had previously (defunct pve-7 install on same hardware) jumped through the hoops to get the NVIDIA proprietary drivers installed, though was hoping to give the open source drivers a go this time unless those are known not to work.

So far I have:
-verified lspci sees both the NVidia GTX 1060 GPU and the AMD iGPU
-I think I see drivers loaded for the NVidia card
-checked some of the packages I should need for VirGL to work are installed; I was unable to locate a comprehensive list but I assume Proxmox includes all the dependencies?
-tested that VMs, new or recovered from the previous install, work with the default graphics option
-tested that VMs, new or recovered, fail to boot with VirGL selected. GRUB will display using the VGA emulation, but once the actual boot starts, the systems hang with the following before shutting themselves off a few seconds later:
Screenshot from 2023-12-14 20-38-40.png
-tried to use an important recovered VM with spice graphics selected that just shows "Guest has not initialized the display (yet)" and the Guest hangs indefinitely without booting (never becomes pingable either). Guest does boot if the default graphics is selected. Guest was previously working with all 3 and I was using VirGL before the rebuild, which is now also failing (see above).
-I was able to complete an install of a new Debian 11 guest with spice graphics selected. Oddly it flashes the same error as the VirGL machines do but boot completes normally otherwise. Excerpt from dmesg:
[ 0.712055] ACPI: \_SB_.GSIF: Enabled at IRQ 21 [ 0.712716] shpchp 0000:05:01.0: pci_hp_register failed with error -16 [ 0.712727] shpchp 0000:05:01.0: Slot initialization failed [ 0.713450] shpchp 0000:05:02.0: HPC vendor_id 1b36 device_id 1 ss_vid 0 ss_did 0 [ 0.713535] ACPI: \_SB_.GSIG: Enabled at IRQ 22 [ 0.714187] shpchp 0000:05:02.0: pci_hp_register failed with error -16 [ 0.714189] shpchp 0000:05:02.0: Slot initialization failed [ 0.714909] shpchp 0000:05:03.0: HPC vendor_id 1b36 device_id 1 ss_vid 0 ss_did 0 [ 0.714993] ACPI: \_SB_.GSIH: Enabled at IRQ 23 [ 0.715632] shpchp 0000:05:03.0: pci_hp_register failed with error -16 [ 0.715634] shpchp 0000:05:03.0: Slot initialization failed [ 0.716365] shpchp 0000:05:04.0: HPC vendor_id 1b36 device_id 1 ss_vid 0 ss_did 0 [ 0.716449] ACPI: \_SB_.GSIE: Enabled at IRQ 20 [ 0.717087] shpchp 0000:05:04.0: pci_hp_register failed with error -16 [ 0.717090] shpchp 0000:05:04.0: Slot initialization failed [ 0.717482] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4 [ 0.717703] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled [ 0.717934] Linux agpgart interface v0.103 [ 0.717991] AMD-Vi: AMD IOMMUv2 functionality not available on this system - This is not a bug.
Unclear from context if the last line is related to the errors above. The PCI device referenced above is the "PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge". Machines referenced above are all UEFI boot w/ the q35 board, though I wasn't noticing differences with BIOS based VMs.

My Google-fu isn't turning up much of anything for leads and none of the logs I've searched pull up anything relevant. Hoping someone here might have some ideas.
 
Some more testing: I installed another very similar node using the same install procedure and wasn't able to reproduce the issue; that machine has a 1 gen newer NVidia GPU and 1 gen older AMD CPU. VirGL graphics acceleration worked without any special tinkering using the stock drivers.

Those PCIe errors pictured above are definitely a red herring; they also appear on the new fully working node and on all my other VMs.

The VM I was most worried about started booting OK with the spice/QXL display. I don't think I changed anything, I just booted it fully with the default display, tried again with spice/QXL and it worked. Still no joy on VirGL.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!