[SOLVED] GPU AMD Passthrough

I've already checked in the BIOS and right now I have some stuff running so I don't want to shut down the server and do the test you mentionned. I also found the cheapest HDMI-compatible GPU near my place. So I'll keep you posted next week if it solves the issue
 
Hello guys, some updates after buying the cheapest GPU I could find (appears to be a GT710 as you suggested @xi784 )

Below you can find the current configuration, with all the useful output. Now when I start the VM, it gets stuck on the start process. No matter how long I wait, the VM never starts and the dedicated screen stays off (VNC does not work either).
I tried to play with the cpu args, adding the ROM of the GPU etc, but no success, the VM does not start anymore.

If anyone has a clue ! Cheers !

Bash:
root@proxmox:~# lspci
43:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii PRO [Radeon R9 290/390]
43:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii HDMI Audio [Radeon R9 290/290X / 390/390X]
44:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 710B] (rev a1)
44:00.1 Audio device: NVIDIA Corporation GK208 HDMI/DP Audio Controller (rev a1)

Bash:
root@proxmox:~# lspci -n -s 43:00
43:00.0 0300: 1002:67b1
43:00.1 0403: 1002:aac8

Bash:
root@proxmox:~# lspci -v
43:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii PRO [Radeon R9 290/390] (prog-if 00 [VGA controller])
        Subsystem: Gigabyte Technology Co., Ltd Hawaii PRO [Radeon R9 290/390]
        Flags: bus master, fast devsel, latency 0, IRQ 255
        Memory at 80000000 (64-bit, prefetchable) [size=256M]
        Memory at 90000000 (64-bit, prefetchable) [size=8M]
        I/O ports at 4000 [size=256]
        Memory at ae600000 (32-bit, non-prefetchable) [size=256K]
        Expansion ROM at ae640000 [disabled] [size=128K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [200] #15
        Capabilities: [270] #19
        Capabilities: [2b0] Address Translation Service (ATS)
        Capabilities: [2c0] Page Request Interface (PRI)
        Capabilities: [2d0] Process Address Space ID (PASID)
        Kernel driver in use: vfio-pci
        Kernel modules: radeon, amdgpu

43:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii HDMI Audio [Radeon R9 290/290X / 390/390X]
        Subsystem: Gigabyte Technology Co., Ltd Hawaii HDMI Audio [Radeon R9 290/290X / 390/390X]
        Flags: fast devsel, IRQ 255
        Memory at ae660000 (64-bit, non-prefetchable) [disabled] [size=16K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel

Bash:
root@proxmox:~# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=1002:67b1,1002:aac8 disable_vga=1

Bash:
root@proxmox:~# cat /etc/modprobe.d/pve-blacklist.conf
blacklist nvidiafb
blacklist radeon
blacklist amdgpu
blacklist snd_hda_intel

Bash:
root@proxmox:~# cat /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet pci=noaer amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset"

Bash:
root@proxmox:~# cat /etc/pve/qemu-server/102.conf
bios: ovmf
boot: cd
bootdisk: scsi0
cores: 8
cpu: host,hidden=1,flags=+pcid
efidisk0: VM-Storage:vm-102-disk-0,size=1M
hostpci0: 43:00.0,pcie=1,x-vga=on
hostpci1: 43:00.1,pcie=1
hotplug: disk,network,usb
ide0: local:iso/virtio-win-0.1.173.iso,media=cdrom,size=384670K
machine: q35
memory: 8192
name: W10
net0: virtio=9A:80:BE:8B:54:97,bridge=vmbr0,firewall=1
numa: 1
ostype: win10
scsi0: VM-OS:vm-102-disk-0,cache=writeback,size=160G
scsihw: virtio-scsi-pci
smbios1: uuid=0db0d068-f8bf-4e93-8d27-d3f9a869b7ff
sockets: 1
vga: none
vmgenid: d8f2a3b7-6c29-45c1-8298-3ae2665e7a96
 
Bash:
root@proxmox:~# dmesg | grep iommu
[    0.743970] pci 0000:00:01.0: Adding to iommu group 0
[    0.744663] pci 0000:00:01.1: Adding to iommu group 1
[    0.745718] pci 0000:00:02.0: Adding to iommu group 2
[    0.746598] pci 0000:00:03.0: Adding to iommu group 3
[    0.747294] pci 0000:00:04.0: Adding to iommu group 4
[    0.748320] pci 0000:00:07.0: Adding to iommu group 5
[    0.749010] pci 0000:00:07.1: Adding to iommu group 6
[    0.749992] pci 0000:00:08.0: Adding to iommu group 7
[    0.750733] pci 0000:00:08.1: Adding to iommu group 8
[    0.751726] pci 0000:00:14.0: Adding to iommu group 9
[    0.751745] pci 0000:00:14.3: Adding to iommu group 9
[    0.752485] pci 0000:00:18.0: Adding to iommu group 10
[    0.752502] pci 0000:00:18.1: Adding to iommu group 10
[    0.752517] pci 0000:00:18.2: Adding to iommu group 10
[    0.752536] pci 0000:00:18.3: Adding to iommu group 10
[    0.752551] pci 0000:00:18.4: Adding to iommu group 10
[    0.752567] pci 0000:00:18.5: Adding to iommu group 10
[    0.752582] pci 0000:00:18.6: Adding to iommu group 10
[    0.752598] pci 0000:00:18.7: Adding to iommu group 10
[    0.753686] pci 0000:00:19.0: Adding to iommu group 11
[    0.753705] pci 0000:00:19.1: Adding to iommu group 11
[    0.753721] pci 0000:00:19.2: Adding to iommu group 11
[    0.753737] pci 0000:00:19.3: Adding to iommu group 11
[    0.753753] pci 0000:00:19.4: Adding to iommu group 11
[    0.753769] pci 0000:00:19.5: Adding to iommu group 11
[    0.753785] pci 0000:00:19.6: Adding to iommu group 11
[    0.753801] pci 0000:00:19.7: Adding to iommu group 11
[    0.754511] pci 0000:01:00.0: Adding to iommu group 12
[    0.754540] pci 0000:01:00.1: Adding to iommu group 12
[    0.754569] pci 0000:01:00.2: Adding to iommu group 12
[    0.754582] pci 0000:02:00.0: Adding to iommu group 12
[    0.754596] pci 0000:02:04.0: Adding to iommu group 12
[    0.754609] pci 0000:02:05.0: Adding to iommu group 12
[    0.754623] pci 0000:02:06.0: Adding to iommu group 12
[    0.754636] pci 0000:02:07.0: Adding to iommu group 12
[    0.754657] pci 0000:04:00.0: Adding to iommu group 12
[    0.754681] pci 0000:05:00.0: Adding to iommu group 12
[    0.754701] pci 0000:06:00.0: Adding to iommu group 12
[    0.755717] pci 0000:08:00.0: Adding to iommu group 13
[    0.756411] pci 0000:08:00.2: Adding to iommu group 14
[    0.757300] pci 0000:08:00.3: Adding to iommu group 15
[    0.757943] pci 0000:09:00.0: Adding to iommu group 16
[    0.758855] pci 0000:09:00.2: Adding to iommu group 17
[    0.759480] pci 0000:40:01.0: Adding to iommu group 18
[    0.760362] pci 0000:40:01.1: Adding to iommu group 19
[    0.760965] pci 0000:40:01.2: Adding to iommu group 20
[    0.761855] pci 0000:40:01.3: Adding to iommu group 21
[    0.762452] pci 0000:40:02.0: Adding to iommu group 22
[    0.763352] pci 0000:40:03.0: Adding to iommu group 23
[    0.764114] pci 0000:40:03.1: Adding to iommu group 24
[    0.764777] pci 0000:40:04.0: Adding to iommu group 25
[    0.765721] pci 0000:40:07.0: Adding to iommu group 26
[    0.766345] pci 0000:40:07.1: Adding to iommu group 27
[    0.767270] pci 0000:40:08.0: Adding to iommu group 28
[    0.767892] pci 0000:40:08.1: Adding to iommu group 29
[    0.768836] pci 0000:41:00.0: Adding to iommu group 30
[    0.769611] pci 0000:42:00.0: Adding to iommu group 31
[    0.770284] pci 0000:43:00.0: Adding to iommu group 32
[    0.770353] pci 0000:43:00.0: Using iommu direct mapping
[    0.770389] pci 0000:43:00.1: Adding to iommu group 32
[    0.770722] pci 0000:44:00.0: Adding to iommu group 33
[    0.770757] pci 0000:44:00.1: Adding to iommu group 33
[    0.771663] pci 0000:45:00.0: Adding to iommu group 34
[    0.772289] pci 0000:45:00.2: Adding to iommu group 35
[    0.773183] pci 0000:45:00.3: Adding to iommu group 36
[    0.773823] pci 0000:46:00.0: Adding to iommu group 37
[    0.774739] pci 0000:46:00.2: Adding to iommu group 38
[    0.776715] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[    0.776732] perf/amd_iommu: Detected AMD IOMMU #1 (2 banks, 4 counters/bank).

Bash:
root@proxmox:~# dmesg | grep -i -e DMAR -e IOMMU
[    0.742246] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    0.742301] pci 0000:40:00.2: AMD-Vi: IOMMU performance counters supported
[    0.743970] pci 0000:00:01.0: Adding to iommu group 0
[    0.744663] pci 0000:00:01.1: Adding to iommu group 1
[    0.745718] pci 0000:00:02.0: Adding to iommu group 2
[    0.746598] pci 0000:00:03.0: Adding to iommu group 3
[    0.747294] pci 0000:00:04.0: Adding to iommu group 4
[    0.748320] pci 0000:00:07.0: Adding to iommu group 5
[    0.749010] pci 0000:00:07.1: Adding to iommu group 6
[    0.749992] pci 0000:00:08.0: Adding to iommu group 7
[    0.750733] pci 0000:00:08.1: Adding to iommu group 8
[    0.751726] pci 0000:00:14.0: Adding to iommu group 9
[    0.751745] pci 0000:00:14.3: Adding to iommu group 9
[    0.752485] pci 0000:00:18.0: Adding to iommu group 10
[    0.752502] pci 0000:00:18.1: Adding to iommu group 10
[    0.752517] pci 0000:00:18.2: Adding to iommu group 10
[    0.752536] pci 0000:00:18.3: Adding to iommu group 10
[    0.752551] pci 0000:00:18.4: Adding to iommu group 10
[    0.752567] pci 0000:00:18.5: Adding to iommu group 10
[    0.752582] pci 0000:00:18.6: Adding to iommu group 10
[    0.752598] pci 0000:00:18.7: Adding to iommu group 10
[    0.753686] pci 0000:00:19.0: Adding to iommu group 11
[    0.753705] pci 0000:00:19.1: Adding to iommu group 11
[    0.753721] pci 0000:00:19.2: Adding to iommu group 11
[    0.753737] pci 0000:00:19.3: Adding to iommu group 11
[    0.753753] pci 0000:00:19.4: Adding to iommu group 11
[    0.753769] pci 0000:00:19.5: Adding to iommu group 11
[    0.753785] pci 0000:00:19.6: Adding to iommu group 11
[    0.753801] pci 0000:00:19.7: Adding to iommu group 11
[    0.754511] pci 0000:01:00.0: Adding to iommu group 12
[    0.754540] pci 0000:01:00.1: Adding to iommu group 12
[    0.754569] pci 0000:01:00.2: Adding to iommu group 12
[    0.754582] pci 0000:02:00.0: Adding to iommu group 12
[    0.754596] pci 0000:02:04.0: Adding to iommu group 12
[    0.754609] pci 0000:02:05.0: Adding to iommu group 12
[    0.754623] pci 0000:02:06.0: Adding to iommu group 12
[    0.754636] pci 0000:02:07.0: Adding to iommu group 12
[    0.754657] pci 0000:04:00.0: Adding to iommu group 12
[    0.754681] pci 0000:05:00.0: Adding to iommu group 12
[    0.754701] pci 0000:06:00.0: Adding to iommu group 12
[    0.755717] pci 0000:08:00.0: Adding to iommu group 13
[    0.756411] pci 0000:08:00.2: Adding to iommu group 14
[    0.757300] pci 0000:08:00.3: Adding to iommu group 15
[    0.757943] pci 0000:09:00.0: Adding to iommu group 16
[    0.758855] pci 0000:09:00.2: Adding to iommu group 17
[    0.759480] pci 0000:40:01.0: Adding to iommu group 18
[    0.760362] pci 0000:40:01.1: Adding to iommu group 19
[    0.760965] pci 0000:40:01.2: Adding to iommu group 20
[    0.761855] pci 0000:40:01.3: Adding to iommu group 21
[    0.762452] pci 0000:40:02.0: Adding to iommu group 22
[    0.763352] pci 0000:40:03.0: Adding to iommu group 23
[    0.764114] pci 0000:40:03.1: Adding to iommu group 24
[    0.764777] pci 0000:40:04.0: Adding to iommu group 25
[    0.765721] pci 0000:40:07.0: Adding to iommu group 26
[    0.766345] pci 0000:40:07.1: Adding to iommu group 27
[    0.767270] pci 0000:40:08.0: Adding to iommu group 28
[    0.767892] pci 0000:40:08.1: Adding to iommu group 29
[    0.768836] pci 0000:41:00.0: Adding to iommu group 30
[    0.769611] pci 0000:42:00.0: Adding to iommu group 31
[    0.770284] pci 0000:43:00.0: Adding to iommu group 32
[    0.770353] pci 0000:43:00.0: Using iommu direct mapping
[    0.770389] pci 0000:43:00.1: Adding to iommu group 32
[    0.770722] pci 0000:44:00.0: Adding to iommu group 33
[    0.770757] pci 0000:44:00.1: Adding to iommu group 33
[    0.771663] pci 0000:45:00.0: Adding to iommu group 34
[    0.772289] pci 0000:45:00.2: Adding to iommu group 35
[    0.773183] pci 0000:45:00.3: Adding to iommu group 36
[    0.773823] pci 0000:46:00.0: Adding to iommu group 37
[    0.774739] pci 0000:46:00.2: Adding to iommu group 38
[    0.774956] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.774963] pci 0000:40:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.776715] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[    0.776732] perf/amd_iommu: Detected AMD IOMMU #1 (2 banks, 4 counters/bank).
 
Again:
This is wrong

hostpci0: 43:00.0,pcie=1,x-vga=on
hostpci1: 43:00.1,pcie=1

right:
hostpci0: 43:00,pcie=1,x-vga=on,romfile=vbios.bin

I would also recommend using a bios romfile.
Must be located in: /usr/share/kvm/"vbios.bin"

-------------------------------------
can you check this output:

dmesg | grep ecap
 
also you are better off with AMD as nvidia checks for VMs. if a nvidia card is inside a virtualisation, the driver will have a "bug" (error 43)
i changed completly to AMD.
 
Again:
This is wrong

hostpci0: 43:00.0,pcie=1,x-vga=on
hostpci1: 43:00.1,pcie=1

right:
hostpci0: 43:00,pcie=1,x-vga=on,romfile=vbios.bin

I would also recommend using a bios romfile.
Must be located in: /usr/share/kvm/"vbios.bin"

-------------------------------------
can you check this output:

dmesg | grep ecap

I know, but I tried both with and without the rom file and the VM hangs in the starting state. I just copy paste my last test but there were many... I also have no display output neither on the screen plugged to the GPU nor on the integrated VNC. When I double click on the line at the bottom, the one that displays all your actions, I have no output, it just says it's starting, but it hangs like this for ages (I let it run for 3 hours with no results).
 
also you are better off with AMD as nvidia checks for VMs. if a nvidia card is inside a virtualisation, the driver will have a "bug" (error 43)
i changed completly to AMD.

Yes but I'm familiar with Nvidia for its encoding/decoding features. The main goal for this GPU passthrough is for a media server to be able to decode multiple 4k video. I tried only with the CPU now that I have 24 cores, but even assigning all the cores prevents me from decoding multiple file (one is OK but 2 makes the decoding goes at 20fps which is a no-go for a media server). So I have to use a GPU.
But I have no issue going AMD, I just could not find the info I need.
Do you know a webpage similar to this one but for AMD : https://developer.nvidia.com/video-encode-decode-gpu-support-matrix

Also, side question : if you have a small nvidia card just for Proxmox display (like the one I bought, a GT710), and you want to passthrough a second Nvidia card to a VM. If you blacklist the nvidia driver at boot, you will also blacklist the main GPU driver.
So how do you bypass a GPU when it's the same brand/driver as the main GPU ?
It won't be my case I hope, I'm just curious
 
I'm also surprised - I think the passthrough card is an AMD Radeon R9 290.

In the end it doesn't change anything, for the nvidia card only a standard vga driver is loaded. It is only important that the card is loaded with the vfio-pci kernel module and not recognized as the primary card.

I think the NVIDIA GT710 is not recognized as the primary card. - it could help to put the nvidia in the first PCIe slot and the AMD below.

---------------------------
dmesg | grep ecap
 
No no you are right, currently I'm trying to pass the AMD GPU. Because this GPU has been lent to me by a friend of mine until I can make everything works and know what I have to buy.

I also have a GT710 in the first PCI slot, the AMD is in the second slot.

When I'm be successful with the AMD GPU passthrough, I'll try to do the same with an Nvidia one (but I'll apparently hit the bug 43). Thus my side question for my future me : can you have 2 GPU of the same brand and passthrough only one?

Also for your dmesg output, I'm facing performance issue with ZFS right now and this is my main concern. The server has been reinstall and I'll get back to you as soon as I can with this output on a fresh system... Sorry for the inconvenience and thanks for your help ;)
 
Ok, a bit of improvement :

I realized that no matter what, if there is a screen plugged into the AMD GPU (the one I want to passthrough), the system will use it as the primary GPU.
I wonder if the AMD card if faster to boot than the Nvidia or if it's another parameter, but I can see the POST message on the Nvidia display, it blacks out, and I can see the Proxmox grub page on the AMD display.
If I disabled the AMD driver, at some point on the boot the AMD screen freezes.
In this specific environment, starting the VM results in the screenshot I posted previousy (the proxmox shattered logo), and the VM does not boot.

So I unplugged the display from AMD GPU and restarted Proxmox. Once it's booted, I plug the display and boot the VM to get almost the same result, the VM hangs on the Proxmox logo but this time it's not shattered.

If I specify that the AMD GPU is not the primary display, the VM boots, and once in Windows I can see an error message saying that "this device [the AMD GPU] has been disabled by Windows (code error 43)".

So my guess is : since the beginning all the confs were correct, the issue is more in this code 43 error from AMD which disabled the display !

As you requested dmesg | grep ecap is empty, either I "boot" with the Nvidia card as primary or with the AMD card as primary
 
When booting the VM, I can see these logs on Proxmox :

[ 84.090188] vfio-pci 0000:43:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
[ 84.090209] vfio-pci 0000:43:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
[ 84.115715] vfio-pci 0000:43:00.1: enabling device (0000 -> 0002)

So audio seems to be passed fined but not the video.

Edit : I got it working... at last
I apparently don't even need a dummy display because even without an HDMI cable plugged, the VM boots, I can RDP in it, the GPU is ok, and I can stress test with furmark. So everything seems fine now. I'll just wait 2-3 days of test to mark this thread as solved.

So the issue was : the AMD GPU was selected as the primary, regardless of the PCI slot it used.
Solution : unplug the HDMI cable from the AMD GPU (my wish-to-be 2nd GPU), so the system boots up on the Nvidia.

All the conf mentioned in this post were ok, the issue did not rely on the conf part.
 
Last edited:
After spending many hours on this very topic. Stop buying AMD GPU. I was stupid enough to buy an AMD Radeon VII which also has the reset issue. Basically AMD should very well be able to fix this but simply ignores this topic. It is not even documented on their portal.

File bug reports as much as you can, harass them and so on. Personally i sent my last mail to AMD requesting to clarify the issue. If they do not i will not every buy AMD again. This stinks.


EDIT : AMD responded stating the issues are entirely with Linux software, to my understanding his means VFIO
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!