[SOLVED] GPU Passthrough with Ryzen

ninjulian_

New Member
Apr 6, 2022
4
1
3
Before I begin: I am very new to VMs in general and Proxmox specifically, so please excuse any stupid mistakes I might have made/will make. Thanks :)

So, I can't get GPU-Passthrough to work on my machine.

As long as there is no GPU passed through to the VM, everything works absolutely fine, but as soon as I try to pass the GPU through, the VM will boot and the Hardware Monitor shows, that the GPU is passed through to the VM, but I get nothing but a black screen. I also cant remote into the VM anymore, I just get an "Unable to connect" message from Windows Remote-Desktop.

My Setup:
CPU: AMD Ryzen 7 1700
RAM: 16GB G.Skill Aegis 3000Mhz
MB: MSI X370 Gaming Pro Carbon
GPU1 (to be passed through): Gigabyte RX Vega 64 OC
GPU2 (for the host): MSI GeForce GT 710

I obviously enabled all the necessary BIOS features, like SVM and IOMMU, my BIOS is also set to UEFI only and SecureBoot is enabled.

Grub:
Code:
# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
#   info -f grub -n 'Simple configuration'

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt amd_iommu=on video=efifb:off"
GRUB_CMDLINE_LINUX=""

# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"

# Uncomment to disable graphical terminal (grub-pc only)
#GRUB_TERMINAL=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480

# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true

# Uncomment to disable generation of recovery mode menu entries
#GRUB_DISABLE_RECOVERY="true"

# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"

dmesg | grep -e DMAR -e IOMMU:
Bash:
[    0.222406] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    0.223372] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.223985] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

VM config:
Code:
agent: 1
bios: ovmf
boot: order=scsi0;net0
cores: 8
efidisk0: local:100/vm-100-disk-1.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
hostpci0: 0000:2a:00,x-vga=on,pcie=1,romfile=Gigabyte.RXVega64.8192.180110.rom
machine: pc-q35-6.1
memory: 8192
meta: creation-qemu=6.1.0,ctime=1649277898
name: Windows10-GPU
net0: virtio=F2:74:01:9B:85:7A,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: local:100/vm-100-disk-0.qcow2,cache=writeback,size=256G
scsihw: virtio-scsi-pci
smbios1: uuid=13c389dd-3b1b-4d03-a212-bbb19c939019
sockets: 1
vmgenid: 627d57c8-0650-4823-9533-2189b9b3ebec

I have also added the necessary kernel modules and also have blacklisted amdgpu and radeon drivers, and assigned vfio drivers to the card, as can be seen here:

Bash:
2a:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] (rev c1) (prog-if 00 [VGA controller])

    Subsystem: Gigabyte Technology Co., Ltd Vega 10 XL/XT [Radeon RX Vega 56/64]

    Flags: bus master, fast devsel, latency 0, IRQ 42, IOMMU group 17

    Memory at 7fd0000000 (64-bit, prefetchable) [size=256M]

    Memory at 7fe0000000 (64-bit, prefetchable) [size=2M]

    I/O ports at d000 [size=256]

    Memory at f7800000 (32-bit, non-prefetchable) [size=512K]

    Expansion ROM at f7880000 [disabled] [size=128K]

    Capabilities: [48] Vendor Specific Information: Len=08 <?>

    Capabilities: [50] Power Management version 3

    Capabilities: [64] Express Legacy Endpoint, MSI 00

    Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+

    Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>

    Capabilities: [150] Advanced Error Reporting

    Capabilities: [200] Physical Resizable BAR

    Capabilities: [270] Secondary PCI Express

    Capabilities: [2a0] Access Control Services

    Capabilities: [2b0] Address Translation Service (ATS)

    Capabilities: [2c0] Page Request Interface (PRI)

    Capabilities: [2d0] Process Address Space ID (PASID)

    Capabilities: [320] Latency Tolerance Reporting

    Kernel driver in use: vfio-pci

    Kernel modules: amdgpu


2a:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64]

    Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64]

    Flags: fast devsel, IRQ 81, IOMMU group 18

    Memory at f78a0000 (32-bit, non-prefetchable) [disabled] [size=16K]

    Capabilities: [48] Vendor Specific Information: Len=08 <?>

    Capabilities: [50] Power Management version 3

    Capabilities: [64] Express Endpoint, MSI 00

    Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+

    Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>

    Capabilities: [150] Advanced Error Reporting

    Capabilities: [2a0] Access Control Services

    Kernel driver in use: vfio-pci

    Kernel modules: snd_hda_intel

I have also tried Switching the PCI-E Slot the cards are in, but the only thing that accomplished is, that the AMD card now shows the GRUB options during boot.

If you have any ideas on how to get this working, I'd be massively thankful :-D.
 
IO-MMU needs to be enabled in bios, then after that check if the GPU is in its own IOMMU group, if it is try again. You cannot use the GPU on the host at the same time as the guest. So you need a second GPU for that.
 
Sounds like the GPU does not reset properly, which the Vega 64 GPU is notoriously known. Install vendor-reset by following this guide and see if that helps.
If you are using kernel 5.15, you need to run echo 'device_specific' >"/sys/bus/pci/devices/0000:2a:00.0/reset_method" before starting the VM.
 
IO-MMU needs to be enabled in bios, then after that check if the GPU is in its own IOMMU group, if it is try again. You cannot use the GPU on the host at the same time as the guest. So you need a second GPU for that.
As I have written, IOMMU is enabled, and I am already using a second GPU for the Host (thats what the GT 710 is doing in my system). both GPUs are in their own IOMMU-Group.

Bash:
IOMMU Group 0 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 1 00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 2 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 3 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 4 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 5 00:03.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 6 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 7 00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 8 00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
IOMMU Group 9 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 10 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
IOMMU Group 11 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 59)
IOMMU Group 11 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU Group 12 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
IOMMU Group 12 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
IOMMU Group 12 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
IOMMU Group 12 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
IOMMU Group 12 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
IOMMU Group 12 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
IOMMU Group 12 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 [1022:1466]
IOMMU Group 12 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
IOMMU Group 13 03:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset USB 3.1 xHCI Controller [1022:43b9] (rev 02)
IOMMU Group 13 03:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset SATA Controller [1022:43b5] (rev 02)
IOMMU Group 13 03:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset PCIe Upstream Port [1022:43b0] (rev 02)
IOMMU Group 13 20:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 13 20:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 13 20:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 13 20:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 13 20:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 13 20:08.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 13 21:00.0 Ethernet controller [0200]: Intel Corporation I211 Gigabit Network Connection [8086:1539] (rev 03)
IOMMU Group 13 26:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller [1b21:2142]
IOMMU Group 14 27:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208B [GeForce GT 710] [10de:128b] (rev a1)
IOMMU Group 14 27:00.1 Audio device [0403]: NVIDIA Corporation GK208 HDMI/DP Audio Controller [10de:0e0f] (rev a1)
IOMMU Group 15 28:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Vega 10 PCIe Bridge [1022:1470] (rev c1)
IOMMU Group 16 29:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Vega 10 PCIe Bridge [1022:1471]
IOMMU Group 17 2a:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] [1002:687f] (rev c1)
IOMMU Group 18 2a:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64] [1002:aaf8]
IOMMU Group 19 2b:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a]
IOMMU Group 20 2b:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
IOMMU Group 21 2b:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller [1022:145c]
IOMMU Group 22 2c:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function [1022:1455]
IOMMU Group 23 2c:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
IOMMU Group 24 2c:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller [1022:1457]

Sounds like the GPU does not reset properly, which the Vega 64 GPU is notoriously known. Install vendor-reset by following this guide and see if that helps.
If you are using kernel 5.15, you need to run echo 'device_specific' >"/sys/bus/pci/devices/0000:2a:00.0/reset_method" before starting the VM.
Thank you very much for your help, sadly it didn't solve the problem. After not getting it installed properly and troubleshooting for a while I realised, that I was using the wrong repositories and got it working, but instead of getting the dmesg output, that I should get according to the guide, I only got:

Bash:
[  292.615586] vfio-pci 0000:2a:00.0: AMD_VEGA10: version 1.0
[  292.615590] vfio-pci 0000:2a:00.0: AMD_VEGA10: performing pre-reset
[  292.615697] vfio-pci 0000:2a:00.0: AMD_VEGA10: performing reset
[  292.930380] ATOM BIOS: xxx-xxx-xxx
[  292.930381] vendor-reset-drm: atomfirmware: bios_scratch_reg_offset initialized to 4c
[  293.156445] vfio-pci 0000:2a:00.0: AMD_VEGA10: bus reset disabled? yes
[  293.156451] vfio-pci 0000:2a:00.0: AMD_VEGA10: SMU response reg: ffffffff, sol reg: 0, mp1 intr enabled? no, bl ready? no, baco? off
[  293.156455] vfio-pci 0000:2a:00.0: AMD_VEGA10: performing post-reset
[  293.192921] vfio-pci 0000:2a:00.0: AMD_VEGA10: reset result = 0
over and over again.
 
I'm sorry but I think you picked the worst GPU for passthrough. It looks like vendor-reset is working but cannot reset your GPU. Although you appear to have an older version (1.0 instead of 1.1 but that might just be cosmetic). Are you sure you did a git pull and are on the master branch? You could try asking for help at vendor-reset, as they know more about your GPU and how to build the latest version of the module.
 
I'm sorry but I think you picked the worst GPU for passthrough.
yeah, I have a bit of a habit of picking the worst possible hardware for a task, months or years before I even know i want to accomplish it...
It looks like vendor-reset is working but cannot reset your GPU. Although you appear to have an older version (1.0 instead of 1.1 but that might just be cosmetic). Are you sure you did a git pull and are on the master branch?

I followed the guide and did a git clone, maybe thats the issue?, although I doubt it. How would I go about updating the version of vendor-reset?

You could try asking for help at vendor-reset, as they know more about your GPU and how to build the latest version of the module.

I will absolutely do that, thanks so much.
 
yeah, I have a bit of a habit of picking the worst possible hardware for a task, months or years before I even know i want to accomplish it...
No worries, I ran into similar things also. PCI passthrough is always a bit of trial and error and I only wish I has a GPU as powerful as yours.
I followed the guide and did a git clone, maybe thats the issue?, although I doubt it. How would I go about updating the version of vendor-reset?
That sounds normal. What did you mean with wrong repositories before? What does git branch tell you? Does git pull go and get more changes?
I will absolutely do that, thanks so much.
I don't want to send you there just because your issue is difficult. Maybe I am wrong, but it sounds like the problem is with the GPU reset and not really Proxmox or its configuration.
 
No worries, I ran into similar things also. PCI passthrough is always a bit of trial and error and I only wish I has a GPU as powerful as yours.

That sounds normal. What did you mean with wrong repositories before? What does git branch tell you? Does git pull go and get more changes?
I am not very familiar with git, please excuse me not adding it before, but git branch returns * master and git pull returns Already up to date..

I don't want to send you there just because your issue is difficult. Maybe I am wrong, but it sounds like the problem is with the GPU reset and not really Proxmox or its configuration.
well, I was nearly done writing my answer, telling you, that I agree, when I got this weird instinct to mess with my config file... I noticed that I still tried to use the romfile option and thought that that could be a problem, so I took it out, and voilà it freaking works!!! Thank you so much for your advice, you really made my day.

The only issue I have now, is that whenever I try to use cpu: host windows wont boot properly and just gives me the "your device ran into a problem, you need to restart" screen and when I try to use cpu: host,hidden=1 it gets stuck in a seemingly infinite loading screen.

Edit: I'll mark this thread as solved anyways, because that seems to be a whole different problem...
 
Last edited:
  • Like
Reactions: leesteken
Sorry to revive an old thread but I'd like to report vendor-reset did solve the code 43 issue with Vega56 here.

One thing I'd like to add is that I didn't succeed in the 1st try, then I noticed this quote from vendor-reset:
This module must be loaded EARLY, the default reset the kernel will try to perform completely breaks the GPU which this module can not recover from.
I had the vm set to start on boot, so it's already non-recoverable when I shutdown the vm and try vendor-reset on it. I disabled start on boot, reboot host, set reset_method to devices_specific, start the vm, it worked beautifully!

I should probably make a snippet to streamline the process.
 
  • Like
Reactions: leesteken

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!