For years, I could boot with my AMD GPU (seeing all Proxmox boot messages and console) and then pass it through to a (Linux) VM later on. With the help of vendor-reset and a Proxmox hook script, I could even pass the GPU back to the amdgpu driver after the VM shuts down. The amdgpu and vfio-pci driver work together nicely. This worked up to pve-kernel-5.11.22-5.
Today the enterprise repository updated to PVE 7.1 and I got pve-kernel-5.11.22-7 and this stopped working. The amdgpu driver no longer releases the GPU and the VM would not start. Blacklisting the amdgpu driver causes the VM to freeze when starting and the syslog to fill with error messages about BAR 0. The same behavior with pve-kernel-5.13.19-1.
I found a work-around by adding
This means that my Proxmox host is now practically headless and I can no longer return a GPU to the host. I can only do passthrough with the first and second x16 slot (both in use by VMs) and the system will only show boot messages and the console on the GPU in the first slot. Therefore, I expect no improvement from a third GPU for the Proxmox host. Note that the USB-controler passthrough still works fine, switching between xhci_hcd and vfio-pci drivers.
Have more people experienced this with the newer kernels? Is this fixable or is it new behavior of the amdgpu and not related to Proxmox or VFIO?
EDIT: This looks a lot like this problem with an AMD GPU and pve-kernel-5.11.22-7 with a Mac VM. Because I have the same error:
				
			Today the enterprise repository updated to PVE 7.1 and I got pve-kernel-5.11.22-7 and this stopped working. The amdgpu driver no longer releases the GPU and the VM would not start. Blacklisting the amdgpu driver causes the VM to freeze when starting and the syslog to fill with error messages about BAR 0. The same behavior with pve-kernel-5.13.19-1.
I found a work-around by adding
video=efifb:off to the kernel parameters (fixing the BAR 0 issue) and binding the GPU early to vfio-pci (which I prefer to blacklisting the driver).This means that my Proxmox host is now practically headless and I can no longer return a GPU to the host. I can only do passthrough with the first and second x16 slot (both in use by VMs) and the system will only show boot messages and the console on the GPU in the first slot. Therefore, I expect no improvement from a third GPU for the Proxmox host. Note that the USB-controler passthrough still works fine, switching between xhci_hcd and vfio-pci drivers.
Have more people experienced this with the newer kernels? Is this fixable or is it new behavior of the amdgpu and not related to Proxmox or VFIO?
EDIT: This looks a lot like this problem with an AMD GPU and pve-kernel-5.11.22-7 with a Mac VM. Because I have the same error:
Cannot bind 0000:0b:00.0 to vfio (when amdgpu is the driver in use).
			
				Last edited: 
				
		
	
										
										
											
	
										
									
								 
	 
	 
 
		