I recently rebuilt my Proxmox host onto PVE 8 after an issue with an update broke my boot and/or corrupted my root. Related to said problems, I chose to install from a Debian 12 ISO and install the Proxmox packages only after configuring my boot and ZFS root. GRUB does not play nice with encrypted root on ZFS, using ZFSBootMenu instead for simplicity and easier root snapshots. I completed the install without issue, recovered my old configuration, and almost everything is working again with the exception of graphics acceleration and spice remoting. I had previously (defunct pve-7 install on same hardware) jumped through the hoops to get the NVIDIA proprietary drivers installed, though was hoping to give the open source drivers a go this time unless those are known not to work.
So far I have:
-verified lspci sees both the NVidia GTX 1060 GPU and the AMD iGPU
-I think I see drivers loaded for the NVidia card
-checked some of the packages I should need for VirGL to work are installed; I was unable to locate a comprehensive list but I assume Proxmox includes all the dependencies?
-tested that VMs, new or recovered from the previous install, work with the default graphics option
-tested that VMs, new or recovered, fail to boot with VirGL selected. GRUB will display using the VGA emulation, but once the actual boot starts, the systems hang with the following before shutting themselves off a few seconds later:
-tried to use an important recovered VM with spice graphics selected that just shows "Guest has not initialized the display (yet)" and the Guest hangs indefinitely without booting (never becomes pingable either). Guest does boot if the default graphics is selected. Guest was previously working with all 3 and I was using VirGL before the rebuild, which is now also failing (see above).
-I was able to complete an install of a new Debian 11 guest with spice graphics selected. Oddly it flashes the same error as the VirGL machines do but boot completes normally otherwise. Excerpt from dmesg:
Unclear from context if the last line is related to the errors above. The PCI device referenced above is the "PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge". Machines referenced above are all UEFI boot w/ the q35 board, though I wasn't noticing differences with BIOS based VMs.
My Google-fu isn't turning up much of anything for leads and none of the logs I've searched pull up anything relevant. Hoping someone here might have some ideas.
So far I have:
-verified lspci sees both the NVidia GTX 1060 GPU and the AMD iGPU
-I think I see drivers loaded for the NVidia card
-checked some of the packages I should need for VirGL to work are installed; I was unable to locate a comprehensive list but I assume Proxmox includes all the dependencies?
-tested that VMs, new or recovered from the previous install, work with the default graphics option
-tested that VMs, new or recovered, fail to boot with VirGL selected. GRUB will display using the VGA emulation, but once the actual boot starts, the systems hang with the following before shutting themselves off a few seconds later:
-tried to use an important recovered VM with spice graphics selected that just shows "Guest has not initialized the display (yet)" and the Guest hangs indefinitely without booting (never becomes pingable either). Guest does boot if the default graphics is selected. Guest was previously working with all 3 and I was using VirGL before the rebuild, which is now also failing (see above).
-I was able to complete an install of a new Debian 11 guest with spice graphics selected. Oddly it flashes the same error as the VirGL machines do but boot completes normally otherwise. Excerpt from dmesg:
[ 0.712055] ACPI: \_SB_.GSIF: Enabled at IRQ 21
[ 0.712716] shpchp 0000:05:01.0: pci_hp_register failed with error -16
[ 0.712727] shpchp 0000:05:01.0: Slot initialization failed
[ 0.713450] shpchp 0000:05:02.0: HPC vendor_id 1b36 device_id 1 ss_vid 0 ss_did 0
[ 0.713535] ACPI: \_SB_.GSIG: Enabled at IRQ 22
[ 0.714187] shpchp 0000:05:02.0: pci_hp_register failed with error -16
[ 0.714189] shpchp 0000:05:02.0: Slot initialization failed
[ 0.714909] shpchp 0000:05:03.0: HPC vendor_id 1b36 device_id 1 ss_vid 0 ss_did 0
[ 0.714993] ACPI: \_SB_.GSIH: Enabled at IRQ 23
[ 0.715632] shpchp 0000:05:03.0: pci_hp_register failed with error -16
[ 0.715634] shpchp 0000:05:03.0: Slot initialization failed
[ 0.716365] shpchp 0000:05:04.0: HPC vendor_id 1b36 device_id 1 ss_vid 0 ss_did 0
[ 0.716449] ACPI: \_SB_.GSIE: Enabled at IRQ 20
[ 0.717087] shpchp 0000:05:04.0: pci_hp_register failed with error -16
[ 0.717090] shpchp 0000:05:04.0: Slot initialization failed
[ 0.717482] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[ 0.717703] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 0.717934] Linux agpgart interface v0.103
[ 0.717991] AMD-Vi: AMD IOMMUv2 functionality not available on this system
- This is not a bug.
Unclear from context if the last line is related to the errors above. The PCI device referenced above is the "PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge". Machines referenced above are all UEFI boot w/ the q35 board, though I wasn't noticing differences with BIOS based VMs.
My Google-fu isn't turning up much of anything for leads and none of the logs I've searched pull up anything relevant. Hoping someone here might have some ideas.