GPU Passthrough on Ryzen w x470D4U

emlynb

Member
Jul 22, 2020
8
0
6
47
I have just installed Proxmox 6.2 on a X470D4U board with a Ryzen 3650x processor. This was previously running Debian buster, with qemu / KVM set up and I had passthrough working fine with it (and still do if I boot back into it).

When I try to pass the WX7100 GPU (first PCIe slot) through to windows 10 in a VM, it fails to boot up and I get status 'internal-error' on the icon in the web UI (see attached image).

If I boot back into the old Debian system, I have no issues passing it through.

My config is:

Code:
args: -machine 'type=q35,kernel_irqchip=on' -cpu 'host,kvm=off,hv_vendor_id=null'
balloon: 0
bios: ovmf
bootdisk: virtio0
cores: 16
cpu: host
efidisk0: local-zfs:vm-110-disk-0,size=1M
hostpci0: 2b:00,pcie=1,x-vga=1,romfile=WX7100.rom
ide2: local:iso/Win10_2004_English_x64.iso,media=cdrom
machine: q35
memory: 8192
name: Windows
net0: e1000=96:06:E0:9F:D0:F9,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
sata0: local:iso/virtio-win-0.1.189.iso,media=cdrom,size=488766K
scsihw: virtio-scsi-pci
smbios1: uuid=1bb70bd5-5ea6-45df-b73b-907dfa598c98
sockets: 1
vga: none
virtio0: hdd-mirror:vm-110-disk-0,size=256G
vmgenid: 04845727-55b9-4163-9056-4c5f04741692

Kernel command line:
Code:
Command line: initrd=\EFI\proxmox\5.4.60-1-pve\initrd.img-5.4.60-1-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs vfio-pci.ids=1002:67c4,1002:aaf0,10de:1c03,10de:10f1 video=efifb:off vga=normal iommu=pt amd_iommu=on kvm_amd.npt=1

How do I go about seeing what is causing the internal error?

I have a second NVIDIA 1060 GPU in the system too, which is passed through fine to the linux VM that uses it.



Interestingly, a similar config works fine for passing through an Nvidia 1030 card on another Proxmox box.
 

Attachments

  • Screen Shot 2020-09-15 at 11.12.13 AM.png
    Screen Shot 2020-09-15 at 11.12.13 AM.png
    8.8 KB · Views: 7
Last edited:
args: -machine 'type=q35,kernel_irqchip=on' -cpu 'host,kvm=off,hv_vendor_id=null'
you should not need that line

what does dmesg/syslog say during the start?
 
I needed that line to get the Nvidia card to passthrough properly and not run into error 43 in the drivers. I left it in for passing through the WX7100 card as I saw no issue.

Dmesg - anything specific you're looking for?

There are two GPUs in this machine, one of which passes through fine. It is only the primary PCIe slot that does not - and only under Proxmox. Under Debian / Libvirt, it all works properly - so how can I dig into where the 'internal-error' status is coming from?

I wondered if efifb was somehow mucking it up, so I tried disabling it. Doesn't appear to work entirely.

[ 0.000000] Kernel command line: initrd=\EFI\proxmox\5.4.60-1-pve\initrd.img-5.4.60-1-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs vfio-pci.ids=1002:67c4,1002:aaf0,10de:1c03,10de:10f1 video=efifb:off vga=normal iommu=pt amd_iommu=on kvm_amd.npt=1 [ 0.245386] pci 0000:2b:00.0: BAR 0: assigned to efifb

But, once I had added the video=efifb:off, then efifb no longer grabs the offending part of iomem:

200f300000-7fffffffff : PCI Bus 0000:00 7fc0000000-7fd1ffffff : PCI Bus 0000:2c 7fc0000000-7fcfffffff : 0000:2c:00.0 7fc0000000-7fcfffffff : vfio-pci 7fd0000000-7fd1ffffff : 0000:2c:00.0 7fd0000000-7fd1ffffff : vfio-pci 7fe0000000-7ff01fffff : PCI Bus 0000:2b 7fe0000000-7fefffffff : 0000:2b:00.0 7ff0000000-7ff01fffff : 0000:2b:00.0
 
So after some tinkering with CSM, boot options etc, I've now got it to the point where it will pick up the correct graphics card for efifb:
Code:
root@pve:~# dmesg | grep efi
[    0.000000] Command line: initrd=\EFI\proxmox\5.4.60-1-pve\initrd.img-5.4.60-1-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs vfio-pci.ids=1002:67c4,1002:aaf0,10de:1c03,10de:10f1 textonly vga=normal video=vesafb:off,astdrmfb,efifb:off amd_iommu=on rd.driver.pre=vfio-pci
[    0.000000] efi: EFI v2.70 by American Megatrends
[    0.000000] efi:  ACPI 2.0=0xbc823000  ACPI=0xbc823000  SMBIOS=0xbd24c000  SMBIOS 3.0=0xbd24b000  MEMATTR=0xb75a6018  ESRT=0xb9dbc798
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.000000] Kernel command line: initrd=\EFI\proxmox\5.4.60-1-pve\initrd.img-5.4.60-1-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs vfio-pci.ids=1002:67c4,1002:aaf0,10de:1c03,10de:10f1 textonly vga=normal video=vesafb:off,astdrmfb,efifb:off amd_iommu=on rd.driver.pre=vfio-pci
[    0.248800] pci 0000:22:00.0: BAR 0: assigned to efifb
[    0.253772] Registered efivars operations
[    0.649971] efifb: probing for efifb
[    0.649980] efifb: framebuffer at 0xf4000000, using 1876k, total 1875k
[    0.649981] efifb: mode is 800x600x32, linelength=3200, pages=1
[    0.649982] efifb: scrolling: redraw
[    0.649984] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
[    1.656014] tsc: Refined TSC clocksource calibration: 3493.437 MHz
[   30.088235] bond0: (slave enp36s0): link status definitely up, 1000 Mbps full duplex
[   30.504232] bond0: (slave enp35s0): link status definitely up, 1000 Mbps full duplex
root@pve:~#

I still cannot pass the WX7100 card through to my VM though, when I start the machine up I see this:
Code:
[ 2091.084019] device tap110i0 entered promiscuous mode
[ 2091.101955] fwbr110i0: port 1(fwln110i0) entered blocking state
[ 2091.101973] fwbr110i0: port 1(fwln110i0) entered disabled state
[ 2091.102325] device fwln110i0 entered promiscuous mode
[ 2091.102519] fwbr110i0: port 1(fwln110i0) entered blocking state
[ 2091.102531] fwbr110i0: port 1(fwln110i0) entered forwarding state
[ 2091.104762] vmbr0: port 5(fwpr110p0) entered blocking state
[ 2091.104783] vmbr0: port 5(fwpr110p0) entered disabled state
[ 2091.105155] device fwpr110p0 entered promiscuous mode
[ 2091.105346] vmbr0: port 5(fwpr110p0) entered blocking state
[ 2091.105359] vmbr0: port 5(fwpr110p0) entered forwarding state
[ 2091.107583] fwbr110i0: port 2(tap110i0) entered blocking state
[ 2091.107601] fwbr110i0: port 2(tap110i0) entered disabled state
[ 2091.107965] fwbr110i0: port 2(tap110i0) entered blocking state
[ 2091.107982] fwbr110i0: port 2(tap110i0) entered forwarding state
[ 2093.151028] vfio-pci 0000:2b:00.0: enabling device (0000 -> 0003)
[ 2093.151749] vfio-pci 0000:2b:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
[ 2093.151775] vfio-pci 0000:2b:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
[ 2093.151796] vfio-pci 0000:2b:00.0: vfio_ecap_init: hiding ecap 0x1e@0x370
[ 2094.291546] pcieport 0000:00:03.1: AER: Uncorrected (Non-Fatal) error received: 0000:00:03.1
[ 2094.291576] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
[ 2094.291597] pcieport 0000:00:03.1: AER:   device [1022:1483] error status/mask=00100000/04400000
[ 2094.291614] pcieport 0000:00:03.1: AER:    [20] UnsupReq               (First)
[ 2094.291628] pcieport 0000:00:03.1: AER:   TLP Header: 34000000 2b000010 00000000 80008000
[ 2094.291673] pcieport 0000:00:03.1: AER: Device recovery successful
root@pve:~#

This issue happens when inside the working Debian buster install though too, so I have not been concerned by it.

So the next step seems to be trying to understand why qemu is unhappy here.
 
OK and after even more googling / tinkering here's what is happening....

PVE is taking the AER to be fatal when in actuality it has recovered. Adding the pci=noaer parameter to the boot prevents the error messages hitting syslog, which in turn keeps PVE's QEMU happy.
 
@r.jochum - yes I have functional passthrough on this card now too, in addition to the other card.

I guess the version of KVM / Qemu in Debian buster is different when it comes to looking at the errors.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!