[SOLVED] GPU AMD Passthrough

Apr 10, 2020
24
0
1
Yes I know, another GPU passthrough post...;)

Well, hello everyone, first time with proxmox, I'm trying to setup GPU passthrough on an AMD platform and following multiple guides/video/wiki I still cannot manage to make it work...

Server detail :
  • AMD Ryzen Threadripper 1920x (AMD-v and AMD-vi available)
  • Asrock X399 MB with IOMMU enabled and CSM disabled (so full UEFI)
  • AMD Radeon R9 290
  • And all the other usual stuff which don't matter for this issue
  • Proxmox : 6.1-8
  • Windows image : 1909 (latest from windows website at the time of writing)

Goal : Windows 10 VM with GPU passthrough

Useful outputs:
root@proxmox:~# lspci
43:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii PRO [Radeon R9 290/390]
43:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii HDMI Audio [Radeon R9 290/290X / 390/390X]
root@proxmox:~# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/31/devices/0000:43:00.0
/sys/kernel/iommu_groups/31/devices/0000:43:00.1

My GPU is 0000:43:00 (.0 for the GPU itself and .1 for the audio) and if I read/understand the output correctly, it belongs to its own IOMMU group 31.

root@proxmox:~# cat /etc/modprobe.d/blacklist.conf
blacklist radeon
blacklist nouveau
blacklist nvidia

root@proxmox:~# cat /etc/modprobe.d/pve-blacklist.conf
# This file contains a list of modules which are not supported by Proxmox VE
# nidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist nvidiafb
blacklist nouveau
blacklist radeon
blacklist nvidia

I was not sure where to put this blacklist, depending on the tuto the information varies. But I guess all .conf files are loaded at boot so it should not matter.

root@proxmox:~# lspci -n -s 43:00
43:00.0 0300: 1002:67b1
43:00.1 0403: 1002:aac8

root@proxmox:~# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=1002:67b1,1002:aac8 disable_vga=1

root@proxmox:~# cat /etc/default/grub
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Proxmox Virtual Environment"
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on video=vesafb: off,efifb: off"
#GRUB_CMDLINE_LINUX_DEFAULT="textonly video=astdrmfb video=efifb: off"
GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/pve-1 boot=zfs"

root@proxmox:~# cat /etc/pve/qemu-server/103.conf
bios: ovmf
bootdisk: scsi0
cores: 8
hostpci0: 43:00,pcie=1,x-vga=1
ide0: local:iso/virtio-win-0.1.173.iso,media=cdrom,size=384670K
ide2: local:iso/WINDOWS_10_3en1_1909__18363.720__FR_X64__DREAM_TEAM_CUSTOM_OS_.iso,media=cdrom
machine: q35
memory: 8192
name: W10-OVMF
net0: virtio=02:6C:80:E4:B1:0B,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: VM-OS:vm-103-disk-0,size=100G
scsihw: virtio-scsi-pci
smbios1: uuid=f42051a2-d52b-4853-9945-6fc0f2152c7f
sockets: 1
vmgenid: 9416e428-8594-479b-8803-df1767695805

For info, if I add cpu: host, hidden=1 Windows crashed at the very beginning with a BSOD

What I get with this configuration :
When I boot up proxmox I lose the video (the server is connected to my TV now), which is normal I understand.
When I boot the W10 VM I get the proxmox logo, 2 QEMU lines telling me that the disk is found and it will boot, but it's stuck here, RAM goes 95% usage, CPU 30% and the screen looks like it's shattered. I don't know exactly how to describe it but if you google image "faulty GPU screen" it looks like this.
However the GPU is not faulty, since at the beginning I installed only a W10 OS (not a VM) to test everything and it's OK.

I also tried to change the PCI slot the card is plugged in. No change.

Any help is appreciated and I can perform more test if needed, there is nothing running on proxmox yet...

Thanks guys !
 

jackydany

Member
Jul 20, 2016
26
1
23
40
Hi,

i am not a professional, but maybe some things you can try and check.

I had very strange problems with W10 and UEFI. I changed the ISO (downloaded a new one on a different computer) and cahgend the virtio driver disk from stable to snaphot. So u use the 1.173 i think at the moment. Then it worked to have a W10 UEFI with AMD GPU pt.
Also use scsi as disk. network virtio etc. everything like in the description in the wiki.

dont set primary vga = 1, then your card will be the main GPU and you cant see the noVNC conole anymore.
To test its better to leave the default gpu in the VM as primary and active so you dont get blanked out.

try cpu = host but not hidden, worked for me.

here is my posting about my problems:
https://forum.proxmox.com/threads/w10-ovmf-installation-impossible.67922/

maybe you it will help you.
 
Apr 10, 2020
24
0
1
I changed the ISO (downloaded a new one on a different computer) and cahgend the virtio driver disk from stable to snaphot.

Already tried. I have one VM with the original Windows ISO from their website, and another custom one.

So u use the 1.173 i think at the moment. Then it worked to have a W10 UEFI with AMD GPU pt.

I was already using 0.1.173-9 (latest)

Also use scsi as disk. network virtio etc. everything like in the description in the wiki.

I think that's how my VM is configured, I double checked and I have everything as in the wiki

dont set primary vga = 1, then your card will be the main GPU and you cant see the noVNC conole anymore.
To test its better to leave the default gpu in the VM as primary and active so you dont get blanked out.

I don't mind since I have a dedicated screen for this VM.

try cpu = host but not hidden, worked for me.

BSOD directly at boot with this option...

I don't know what else I could try right now
 
Apr 10, 2020
24
0
1
Hello,

I'm still looking for a way to make it work.
I know I don't provide any new information, but I don't know what to test anymore now...

I was wondering if having only one GPU in the server could cause any trouble ? Since I'm on Threadripper, I do not have any embedded GPU. Do you think it can cause any issue ?

Thanks !
 

xi784

New Member
Feb 14, 2019
13
0
1
36
blacklist amdgpu

cpu: host,hidden=1

try numa: 1

check with lspci -v which kernel driver in use.



GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on video=vesafb: off,efifb: off"
"video=vesafb: off,efifb: off" i think this not needed anymore.

but you can also try to add:
pcie_acs_override=downstream

then check the iommu groups
dmesg | grep iommu
 

gorfy

New Member
Apr 21, 2020
1
0
1
31
In this exact situation, even killed the network to my entire node somehow toying with different configurations. Running with a 1920x and x399 taichi. Let me know if anyone finds any solutions please, would be much appreciated.
 

Katyusha86

New Member
Apr 26, 2020
1
0
1
34
Hello,

I registered on this forum just to tell that separating the vga and audio worked for me. (AMD 7870 + i5 3570)

So:
1587872353396.png

This was a bad night for me. Just tried Netflix on the browser for it to freeze my win10 VM.
Then while VM doesn't respond to any stop ,shutdown etc... from proxmox gui.
I took the opportunity to update/upgrade Proxmox then reboot the host and then no display on the VM, nada.

Had to find a solution on reddit:
https://www.reddit.com/r/Proxmox/comments/fs8j5d/error_43_nvidia_passthrough_proxmox_618/
 

Veeh

Member
Jul 2, 2017
33
2
13
34
Hello,

I'm still looking for a way to make it work.
I know I don't provide any new information, but I don't know what to test anymore now...

I was wondering if having only one GPU in the server could cause any trouble ? Since I'm on Threadripper, I do not have any embedded GPU. Do you think it can cause any issue ?

Thanks !

I agree with xi784, you should try to add pcie_acs_override=downstream in your grup.

Do you have a display on the VNC screen when you remove x-vga=1 flag ?

FYI, I made my conf with same video card and intel cpu.
you can check my conf there: https://forum.proxmox.com/threads/gpu-passthrough-issue-6-1-radeon-blank-screen.68821/#post-308566
 
Apr 10, 2020
24
0
1
Thank you all for your answers,

I'll try not to forget anyone's recommendation !

TL;DR : still not resolved

@xi784 :
Blacklist amgdpu : no change. See below my blacklist.conf file output
root@proxmox:~# cat /etc/modprobe.d/blacklist.conf
blacklist radeon
blacklist nouveau
blacklist nvidia
blacklist amdgpu
blacklist snd_hda_intel

I tried cpu:host, cpu:host,hidden=1 and adding numa without any improvements. See below the VM conf file right now:
root@proxmox:~# cat /etc/pve/qemu-server/102.conf
bios: ovmf
boot: cd
bootdisk: scsi0
cores: 8
cpu: host,hidden=1
efidisk0: VM-Storage:vm-102-disk-0,size=1M
hostpci0: 43:00.0,pcie=1
hostpci1: 43:00.1,pcie=1
hotplug: disk,network,usb
ide0: local:iso/virtio-win-0.1.173.iso,media=cdrom,size=384670K
machine: q35
memory: 8192
name: W10
net0: virtio=9A:80:BE:8B:54:97,bridge=vmbr0,firewall=1
numa: 1
ostype: win10
scsi0: VM-OS:vm-102-disk-0,cache=writeback,size=160G
scsihw: virtio-scsi-pci
smbios1: uuid=0db0d068-f8bf-4e93-8d27-d3f9a869b7ff
sockets: 1
vga: none
vmgenid: d8f2a3b7-6c29-45c1-8298-3ae2665e7a96
GRUB_CMDLINE_LINUX_DEFAULT="quiet pci=noaer amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:eek:ff,efifb:eek:ff"
43:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii PRO [Radeon R9 290/390] (prog-if 00 [VGA controller])
Subsystem: Gigabyte Technology Co., Ltd Hawaii PRO [Radeon R9 290/390]
Flags: bus master, fast devsel, latency 0, IRQ 46
Memory at 80000000 (64-bit, prefetchable) [size=256M]
Memory at 90000000 (64-bit, prefetchable) [size=8M]
I/O ports at 3000
Memory at 9f500000 (32-bit, non-prefetchable) [size=256K]
Expansion ROM at 9f540000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [58] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [200] #15
Capabilities: [270] #19
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] Page Request Interface (PRI)
Capabilities: [2d0] Process Address Space ID (PASID)
Kernel driver in use: vfio-pci
Kernel modules: radeon, amdgpu

43:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii HDMI Audio [Radeon R9 290/290X / 390/390X]
Subsystem: Gigabyte Technology Co., Ltd Hawaii HDMI Audio [Radeon R9 290/290X / 390/390X]
Flags: bus master, fast devsel, latency 0, IRQ 143
Memory at 9f560000 (64-bit, non-prefetchable) [size=16K]

Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [58] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel

Dmesg | grep iommu shows nothing anymore...
root@proxmox:~# dmesg | grep iommu
root@proxmox:~#

@gorfy First one to solve this deserves a beer from the other ;)

@Katyusha86 Thank you for registering to post a solution, really appreciated. I tried your suggestion as you can see on my conf file. I checked the reddit post you linked and I feel like I have the same conf as you (except you are using an nvidia and I an AMD)

@Veeh I tried to add the downstream parameter to grub with no success. I don't have a display on the VNC with or without the x-vga=1 parameter. I also tried to add flags=+pcid to the vm.conf file as per your post but no improvement
 

xi784

New Member
Feb 14, 2019
13
0
1
36
dmesg | grep iommu and no output is bad.

AMD-V in Bios enabled?
--------------------------------

hostpci0: 43:00.0,pcie=1
hostpci1: 43:00.1,pcie=1

wrong, right:
hostpci0: 43:00,pcie=1,x-vga=1

you can´t split a multifunction device in serperate.

i would also recommend, to dump a rom file from your graphicscard.

hostpci0: 43:00,pcie=1,x-vga=1,romfile=xxx.rom

some uefi based cards needed this.
 
Apr 10, 2020
24
0
1
dmesg | grep iommu and no output is bad.

AMD-V in Bios enabled?
--------------------------------

hostpci0: 43:00.0,pcie=1
hostpci1: 43:00.1,pcie=1

wrong, right:
hostpci0: 43:00,pcie=1,x-vga=1

you can´t split a multifunction device in serperate.

i would also recommend, to dump a rom file from your graphicscard.

hostpci0: 43:00,pcie=1,x-vga=1,romfile=xxx.rom

some uefi based cards needed this.

Of course AMD-v enabled ;)

For the dmesg output, I had it before, but tweaking the configuration multiple times as I did made the output disapeared.

I tried hostpci0: 43:00,pcie=1,x-vga=1 but the output is the same.
In th last output x_vga=1 was not present because I tried @Veeh suggestion (to see if the vnc console was one or not).

For the ROM, I followed the pci passthrough guide from proxmox, but was unable to do it.

# cd /sys/bus/pci/devices/0000:01:00.0/
# echo 1 > rom
# cat rom > /tmp/image.rom
# echo 0 > rom

This does not work for me as the file rom does not exist, and I'm unable to create it. (Of course I replace 0000:01:00.0 with the right ID)
 

xi784

New Member
Feb 14, 2019
13
0
1
36
monitor present on the card?

i´m still confused on no output on dmesg iommu..

----
can you check this:
dmesg | grep -i -e DMAR -e IOMMU

---

techpowerup.com/vgabios/

search for your card, possible you can pick up your bios there
 
Last edited:
Apr 10, 2020
24
0
1
What do you mean by monitor ? A screen ? The TV is plugged to the video card yes.
root@proxmox:~# dmesg | grep -i -e DMAR -e IOMMU
root@proxmox:~#

But dmesg is flooded by :
[ 1431.156817] vfio-pci 0000:43:00.0: BAR 0: can't reserve [mem 0x80000000-0x8fffffff 64bit pref]
 

xi784

New Member
Feb 14, 2019
13
0
1
36
Yes.

I tried to pass the romfile option with the GPU bios, but no improvement.

you cant passthrough in this way, because the card is in use by the host system .. it´s not impossible at all, but the configuration is different and not easy to setup. Also it will not work in every scenario, because not all bios can handle the primary card in the right way.

The easier way is to get a second card, e.g. nvidia gt 710 and use it as the primary card for the host system.
 
Apr 10, 2020
24
0
1
you cant passthrough in this way, because the card is in use by the host system .. it´s not impossible at all, but the configuration is different and not easy to setup. Also it will not work in every scenario, because not all bios can handle the primary card in the right way.

The easier way is to get a second card, e.g. nvidia gt 710 and use it as the primary card for the host system.
I was afraid to come to this conclusion...
So even with the driver blacklisted, the primary (and only GPU) is used by the host ?
 

xi784

New Member
Feb 14, 2019
13
0
1
36
The problem starts with the bios, some bios are able to boot without any gpu, but for the most they need at least one primary card.
The common problem in this case is, that the gpu is in use by the bios.

It´s a simple test for this, remove the card and check the webif, whether the host was booting up.

There is mostly on option to set in bios, how pcie devices are handled by bios. In CSM or UEFI way. You can try to play with this option, insofar as it exists.

So if your bios is able to boot without, you need to call the kernel to not use the framebuffer how you did with command line: video=vesafb:eek:ff,efifb:eek:ff.

But i think you need the second gpu, which is much easier to setup.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!