Proxmox 8.1.3 GPU passthrough soooo close!

ecotechie

Member
Nov 11, 2023
36
1
8
Hi,

I'm tying to setup a VM (Linux Mint, or any other Linux) with GPU passthrough. The idea is that I will then connect the HDMI plug from the AMD Radeon RX 7600 to see the VM directly on my screen, passing through a mouse/keyboard.

The VM does see the card as you can see from the output of lspci -v below, but I'm not able to get it to actually output anything right now.

lspci -v from the host:

Code:
0000:03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7700S/7600S] (rev cf) (prog-if 00 [VGA controller])
        Subsystem: Sapphire Technology Limited Navi 33 [Radeon RX 7700S/7600/7600S/7600M XT/PRO W7600]
        Flags: bus master, fast devsel, latency 0, IRQ 16, IOMMU group 18
        Memory at 6000000000 (64-bit, prefetchable) [size=8G]
        Memory at 6200000000 (64-bit, prefetchable) [size=256M]
        I/O ports at 6000 [size=256]
        Memory at 45400000 (32-bit, non-prefetchable) [size=1M]
        Expansion ROM at 45500000 [disabled] [size=128K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [64] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [200] Physical Resizable BAR
        Capabilities: [240] Power Budgeting <?>
        Capabilities: [270] Secondary PCI Express
        Capabilities: [2a0] Access Control Services
        Capabilities: [2d0] Process Address Space ID (PASID)
        Capabilities: [320] Latency Tolerance Reporting
        Capabilities: [410] Physical Layer 16.0 GT/s <?>
        Capabilities: [450] Lane Margining at the Receiver <?>
        Kernel driver in use: vfio-pci
        Kernel modules: amdgpu


lspci -v from the guest:

1704583000608.png


VM's conf file:

Code:
root@pod:~# cat /etc/pve/qemu-server/109.conf
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
balloon: 0
bios: ovmf
boot: order=scsi0;ide2
cores: 8
cpu: host,hidden=1,flags=+pcid
efidisk0: local-lvm:vm-109-disk-4,efitype=4m,size=4M
hostpci0: 0000:03:00,pcie=1,x-vga=1
ide2: local:iso/linuxmint-21.1-mate-64bit.iso,media=cdrom,size=2678466K
machine: q35
memory: 16384
meta: creation-qemu=8.1.2,ctime=1704516334
name: mint-mate
net0: virtio=BC:24:11:63:25:36,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: local-lvm:vm-109-disk-2,iothread=1,size=32G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=dca44773-917e-4b7e-a870-6fd41a48ffa7
sockets: 4
vga: std
vmgenid: fe656cce-26fa-4ed4-aace-f3dff3c25e66

1704585216883.png

GRUB:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt initcall_blacklist=sysfb_init video=efifb:off"


/etc/modprobe.d/vfio.conf:
Code:
options vfio-pci ids=1002:ab30,1002:7480,8086:7a70 disable_vga=1


/etc/modprobe.d/pve-blacklist.conf:
Code:
blacklist amdgpu
blacklist radeon
softdep amdgpu pre: vfio-pci

I did get this to work once on a separate VM, so I think it can be done with my hardware. It worked when I switched my TV to the VM's HDMI port, but I forgot to attach the keyboard/mouse. When I rebooted, it no longer worked. Shoulda taken notes...

Any input here would be fantastic! This is the first step to make my setup usable. I already have another node with many services I'd like to move over to this one, but first things first.
 
Last edited:
Code:
root@pod:~# cat /etc/pve/qemu-server/109.conf
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
balloon: 0
bios: ovmf
boot: order=scsi0;ide2
cores: 8
cpu: host,hidden=1,flags=+pcid
efidisk0: local-lvm:vm-109-disk-4,efitype=4m,size=4M
hostpci0: 0000:03:00,pcie=1,x-vga=1
ide2: local:iso/linuxmint-21.1-mate-64bit.iso,media=cdrom,size=2678466K
machine: q35
memory: 16384
meta: creation-qemu=8.1.2,ctime=1704516334
name: mint-mate
net0: virtio=BC:24:11:63:25:36,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: local-lvm:vm-109-disk-2,iothread=1,size=32G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=dca44773-917e-4b7e-a870-6fd41a48ffa7
sockets: 4
vga: std
vmgenid: fe656cce-26fa-4ed4-aace-f3dff3c25e66
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off' is an (old) NVidia work-around; don't use it for AMD GPUs.
hidden=1,flags=+pcid is an (old) NVidia work-around; don't use it for AMD GPUs.
x-vga=1 (Primary GPU) is for NVidia GPUs; don't use it for AMD GPUs.
Try setting Display to None to force output on the GPU.
Why 4 sockets but no NUMA? I don't see the passthrough of a USB controller (or USB ports), but maybe you'll set that up later.
GRUB:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt initcall_blacklist=sysfb_init video=efifb:off"
video=efifb:off does nothing on Proxmox. You need initcall_blacklist=sysfb_init if you boot Proxmox with the 7600 (which you already have). Does the 7600 show output when you boot Proxmox? If so, try to not do this and boot with the Intel integrated graphics (or another GPU).
What is the output of cat /proc/cmdline? Not all Proxmox installations use GRUB.
/etc/modprobe.d/vfio.conf:
Code:
options vfio-pci ids=1002:ab30,1002:7480,8086:7a70 disable_vga=1

/etc/modprobe.d/pve-blacklist.conf:
Code:
blacklist amdgpu
blacklist radeon
softdep amdgpu pre: vfio-pci
No need to blacklist radeon or amgpu because your already early bind the device to vfio-pci (and you make sure vfio-pci is loaded first). I would put the softdep in vfio.conf and not edit pve-blacklist.conf which might change during an update.
I did get this to work once on a separate VM, so I think it can be done with my hardware. It worked when I switched my TV to the VM's HDMI port, but I forgot to attach the keyboard/mouse. When I rebooted, it no longer worked. Shoulda taken notes...

Any input here would be fantastic! This is the first step to make my setup usable. I already have another node with many services I'd like to move over to this one, but first things first.
Maybe your 7600 does not reset properly and it only works once (per Proxmox boot) and only if it is not used during boot (see earlier remark)?
 
Thanks for the info @leesteken! I've updated things as you suggested, but no output to the hdmi port yet. This is what I have now:

cat /proc/cmdline
Code:
BOOT_IMAGE=/boot/vmlinuz-6.5.11-7-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt initcall_blacklist=sysfb_init


cat /etc/pve/qemu-server/109.conf
Code:
balloon: 0
bios: ovmf
boot: order=scsi0;ide2
cores: 8
cpu: host
efidisk0: local-lvm:vm-109-disk-4,efitype=4m,size=4M
hostpci0: 0000:03:00,pcie=1
ide2: local:iso/linuxmint-21.1-mate-64bit.iso,media=cdrom,size=2678466K
machine: q35
memory: 16384
meta: creation-qemu=8.1.2,ctime=1704516334
name: mint-mate
net0: virtio=BC:24:11:63:25:36,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: local-lvm:vm-109-disk-2,iothread=1,size=32G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=dca44773-917e-4b7e-a870-6fd41a48ffa7
sockets: 4
vga: none
vmgenid: fe656cce-26fa-4ed4-aace-f3dff3c25e66


update-initramfs -u -k all
Code:
update-initramfs: Generating /boot/initrd.img-6.5.11-7-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-6.5.11-4-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.


cat /etc/default/grub
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt initcall_blacklist=sysfb_init"


update-grub
Code:
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.5.11-7-pve
Found initrd image: /boot/initrd.img-6.5.11-7-pve
Found linux image: /boot/vmlinuz-6.5.11-4-pve
Found initrd image: /boot/initrd.img-6.5.11-4-pve
Found memtest86+ 64bit EFI image: /boot/memtest86+x64.efi
Adding boot menu entry for UEFI Firmware Settings ...
done

I'm not able to get to the VM (in the web interface to see anything/settings) without setting vga: none to something else, but still have my hopes up! I have only been getting the output of the host from the built in (motherboard) HDMI, never from the AMD GPU. I did just reset the node to see if that would make a difference in output, but that did not seem to matter at this point.

I'll try to tweak some settings and read up a bit more to see if I can make it work.
 
Last edited:
Not sure what the problem was, but I ended running:
Code:
bash -c "$(wget -qLO - https://github.com/tteck/Proxmox/raw/main/misc/microcode.sh)"

And things are sort of working now, until I reboot the VM... Then I have to reboot the host in order to get it working again. Here are my current settings.

cat /etc/pve/qemu-server/109.conf
Code:
balloon: 0
bios: ovmf
boot: order=scsi0
cores: 8
cpu: host
efidisk0: local-lvm:vm-109-disk-4,efitype=4m,size=4M
hostpci0: 0000:03:00,pcie=1,x-vga=1
hostpci2: 0000:00:14.3
ide2: local:iso/linuxmint-21.1-mate-64bit.iso,media=cdrom,size=2678466K
machine: q35
memory: 16384
meta: creation-qemu=8.1.2,ctime=1704516334
name: mint-mate
net0: virtio=BC:24:11:63:25:36,bridge=vmbr0,firewall=1
numa: 1
ostype: l26
scsi0: local-lvm:vm-109-disk-2,iothread=1,size=32G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=dca44773-917e-4b7e-a870-6fd41a48ffa7
sockets: 4
usb0: host=29ea:0102
usb1: host=062a:4102
vga: none
vmgenid: fe656cce-26fa-4ed4-aace-f3dff3c25e66

/etc/default/grub
Code:
GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on iommu=pt"

cat /etc/modprobe.d/vfio.conf
Code:
options vfio-pci ids=8086:7a70 disable_vga=1

Two issues remain:

I can't seem to get the gpu passtrhough after rebooting the VM. The vendor-reset fix does not seem to work, maybe since the rx7600 is not in the supported list of devices? I get an echo: write error: Invalid argument when trying to update the /sys/bus/pci/devices/0000:03:00.0/reset_method file.

Second problem is that the resolution in the VM is quite low and I can't seem to make it any better than 800x600.
 
Last edited:
Not sure what the problem was, but I ended running:
Code:
bash -c "$(wget -qLO - https://github.com/tteck/Proxmox/raw/main/misc/microcode.sh)"
I don't know that that does.
And things are sort of working now, until I reboot the VM... Then I have to reboot the host in order to get it working again. Here are my current settings.
Sounds like the GPU does not reset properly, which is not uncommon. You'll probably have to live with that until someone provides a work-around in the Linux kernel, vendor-reset or the Windows drivers. The other option is to buy a GPU that is known to work well with passthrough.
cat /etc/pve/qemu-server/109.conf
Code:
balloon: 0
bios: ovmf
boot: order=scsi0
cores: 8
cpu: host
efidisk0: local-lvm:vm-109-disk-4,efitype=4m,size=4M
hostpci0: 0000:03:00,pcie=1,x-vga=1
hostpci2: 0000:00:14.3
ide2: local:iso/linuxmint-21.1-mate-64bit.iso,media=cdrom,size=2678466K
machine: q35
memory: 16384
meta: creation-qemu=8.1.2,ctime=1704516334
name: mint-mate
net0: virtio=BC:24:11:63:25:36,bridge=vmbr0,firewall=1
numa: 1
ostype: l26
scsi0: local-lvm:vm-109-disk-2,iothread=1,size=32G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=dca44773-917e-4b7e-a870-6fd41a48ffa7
sockets: 4
usb0: host=29ea:0102
usb1: host=062a:4102
vga: none
vmgenid: fe656cce-26fa-4ed4-aace-f3dff3c25e66
I still advise against using Primary GPU or using 4 virtual sockets (unless your system has 4 physical sockets).
The vendor-reset fix does not seem to work, maybe since the rx7600 is not in the supported list of devices?
The webpage of vendor-reset shows that it only support specific GPUs and it does not provide a reset mechanism for 6000- and 7000-series at all.
 
Sounds like the GPU does not reset properly, which is not uncommon. You'll probably have to live with that until someone provides a work-around in the Linux kernel, vendor-reset or the Windows drivers. The other option is to buy a GPU that is known to work well with passthrough.
Oh, well. It is what it is. Any ideas where I may find info on what are some newer (or good) compatible GPUs?

In the meantime would you have any pointers on what may be causing the VM to display at 800x600?

Thanks a lot for your input and help. I've read several of your other posts helping people with GPUs.
 
Oh, well. It is what it is. Any ideas where I may find info on what are some newer (or good) compatible GPUs?
Anything supported by vendor-reset and Radeon 6800-6950XT (I've only used the latter myself). Search this and other forums for success stories: https://www.reddit.com/r/VFIO/comments/tq9j5v/need_help_compiling_a_list_of_amd_6000_series/
In the meantime would you have any pointers on what may be causing the VM to display at 800x600?
Maybe the driver is not working? I have no experience with the Radeon 7000-series or Windows 11 myself.
 
The webpage of vendor-reset shows that it only support specific GPUs and it does not provide a reset mechanism for 6000- and 7000-series at all.
I had the same issue with an RX550 and found that adding amdgpu.runpm=0 to /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT solved the problem, so must be something to do with power management. I've now replaced with a RX7600 and i don't have any host issues.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!