This appears to work for now, the only change I made was to UNCOMMENT the "GRUB_TERMINAL=console" line.
GRUB_CMDLINE_LINUX_DEFAULT="ixgbe.allow_unsupported_sfp=1 iommu=pt intel_iommu=on nomodeset video=vesafb:off video=efifb:off"
pve-manager/8.0.4/d258a813cfa6b390 (running kernel: 6.2.16-12-pve)
Dual Intel x5690 CPU (also tested with Intel 26xx v1/v2 series)
Nvidia GT720 VGA outout (also tested with motherboard VGA (matrox) output)
Using VFIO to pass through cards to several VM's.
Problem: As soon as the console booting...
Looking at the kernel messages for wx-4100/rx550/rx560 in Ubuntu guest, I only see one primary thing different.
[ 10.409863] amdgpu 0000:06:10.0: amdgpu: Using BACO for runtime pm
Maybe onto something?
It appears to be some sort of interaction between the kernel, KVM, Ubuntu, and the AMD drivers.
Pulled spare hardware
* installed fresh 7.4 pmx
* tried both 5.15 and 5.19 kernel
* installed both the rx560 & rx550 in same server, vendor-reset, etc
* Ubuntu 22 guest, AMD 5.5...
Have done that as well - hoping it made a difference. No change in behavior. (tried several of the hookscripts - no change)
@aaron : Thank you for providing a specific work around. However, this feels like one of those "old" ideas in desperate need of an update.
I challenge you : When is rebooting the entire cluster on the loss of a network element the preferred behavior? You're taking a communication issue and...
Here's the complaint INSIDE the VM when I run ffmpeg, for example, which has amd support compiled in (The one that comes with jellyfin) - which then must be powered off (hangs on PCI when trying to shut it down)
74.958805] BUG: kernel NULL pointer dereference, address: 00000000000000d8
almost identical configs as the OP, except "AMD-Vi: Interrupt remapping enabled". Same blacklists, same VFIO, same kernel switches. (proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve))
Same card(s) same issue. RX560 (1002:67ef) works but RX550 (1002:699f) does not. Same physical machine(s) -...
I did another lab cluster - 5 nodes, still on 7.4, but upgrading to Quincy as a prep to upgrade to 8.0. Ran into a serious blocker following the directions above. (https://pve.proxmox.com/wiki/Ceph_Pacific_to_Quincy)
The Manager daemons wouldn't restart, got the "masked" message. Turns out...