[TUTORIAL] Simple Working GPU Passthrough on UpToDate PVE and AMD Hardware

w.reimer

New Member
Jun 29, 2022
13
1
3
Germany
Hello Guys,
I wanted to share my steps to my perfectly working GPU Passthrough setup after a lot of debugging, testing and searching for the best, working method. Maybe it helps someone. First my system specs:
  • CPU: AMD Ryzen 7800X3D (Should Probably Work with all Ryzen 7000 APUs)
  • GPU: Sapphire AMD 7900 XTX Pulse
  • RAM: 2x 16GB G.SKILL DDR5 6000 CL30 (Not that Important for this Passthrough, just for information)
  • Mainboard: Gigabyte B650M K
  • SSD: 2x Western Digital WD Black SN850X 2TB (Not that Important for this Passthrough, just for Information)
Current tested PVE version: 8.2.2

BIOS Settings
- CSM (Boot) has to be enabled. Otherwise you will suffer the reset bug.
- IOMMU enabled.
- SR-IOV enabled.

(I also enabled Above 4G Decoding AND Resizable Bar Support, which works perfectly in my case. This is not neccessary for the passthrough, can get you more performance though.)

Enable Modules
You need to enable the required Modules in Proxmox. Therefore edit the configuration with:
Code:
nano /etc/modules-load.d/modules.conf

add these three lines:

Code:
vfio
vfio_iommu_type1
vfio_pci

Save and execute following command:

Code:
update-initramfs -u -k all

Download GPU ROM
You need the fitting ROM file for you're GPU. Without it I experienced the reset bug. In my case (With my GPU) I executed following commands to get the fitting ROM file to the correct folder. Search at techpowerup, if you're lucky they got a ROM file for you're GPU and you can just edit the following lines to mach your's:

Code:
cd /usr/share/kvm
wget https://www.techpowerup.com/vgabios/254079/Sapphire.RX7900XTX.24576.221129.rom
mv Sapphire.RX7900XTX.24576.221129.rom 7900xtx.rom

Restart you're PVE node!

That's it for preparations. Now you can create a vm (I testet it with Fedora 40 Beta and Windows 11). Works perfectly with following Settings:

- CPU Type: Host
- Display: none (After Installation of the System and Display Driver, for Installation I used VirtIO)
- Machine: q35
- BIOS: UEFI
- Controller: VirtIO SCSI Single
- Hard Disk: Discard, IO Thread, SSD Emulation
- Network: VirtIO
- USB Device Passthrough (Mouse/Keyboard or other devices you need to use which are plugged in via USB)
- PCI Passthrough (Raw Device, 0000:XX:XX.X, all functions, ROM Bar, PCI Express, primary GPU)

If you have Problems, add the gpu passthrough only after full windows Installation. Don't forget to install the Qemu-Guest-Tools/Drivers and enable the Guest Agent Option unter VM Options.

VERY IMPORTANT:
After you have createt you're VM AND enabled the PCI Passthrough, you need to add the ROM file to the settings. Sadly there isn't any GUI feature for it, you need to do it manually. Inside the shell edit you're vm config file (NODE is you're machine, default pve, xxx stands for you're VM ID):

Code:
nano /etc/pve/nodes/NODE/qemu-server/XXX.conf

Find the line with you're GPU passthrough in it, and add you're romfile to it. In my case it looks like this:

Code:
hostpci0: 0000:03:00,pcie=1,x-vga=1,romfile=7900xtx.rom

That's it. Just save it and you can start you're VM. Works absolutly Perfectly in my case. I testet it with the integrated Benchmark of Horizon Zero Dawn, on 4k and Max Settings, between Native Windows and the VM I got a deviation of under 1%, so practicly no performance loss. My AMD drivers work without problems, they even recognize Rezisable Bar Support / AMD SmartAccess Memory. I also passed-through one of my SSDs for my games, so that my Windows 11 installation also can use DirectStorage.

OPTIONAL: PVE GRUB Config
Works perfectly without it, but it may get you more performance or resolve some issues in some usecases:
1. In the PVE Shell type: nano /etc/default/grub
2. change the GRUB_CMDLINE_LINUX_DEFAULT="quiet" line to GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt"
3. Close with CTRL + X, and save with y
4. execute: update-grub
 
Last edited:
Small remark: you probably don't need x-vga=1, which is for (old) NVidia GPUs. Also, iommu=pt does not really do anything (or do you see non-passed through devices break or get slow?).
 
The x-vga=1 tag is a must-have in my case. Without it, after booting PVE, the first GPU passthrough doesn't work correctly (no output signal on gpu), only after shutting down the VMs, adding the x-vga=1 tag and starting the VM again, output signal on gpu is working. If I remove the tag afterwards the VM still works, but only until I restart the host again. So it seems that this settings isn't only for old nVidia GPUs.

For the iommu=pt setting: I didn't see a performance decline without it in my small test, but I wanted to include it nontheless because of Proxmox Official Wiki Entry stating:
If your hardware supports IOMMU passthrough mode, enabling this mode might increase performance. This is because VMs then bypass the (default) DMA translation normall yperformed by the hyper-visor and instead pass DMA requests directly to thehardware IOMMU.

Other sources, like Red Hat also recommend it if it's available. I stated at the top that it works perfectly fine without the setting. Though I'll move the grub section to the bottom as an optional config for better visibility of the "need-only" changes.
 
  • Like
Reactions: leesteken
@w.reimer Thanks for sharing your steps

-> I have done exactly what you did with the "VAPOR-X" card and my VM which is going into "internal error" mode. After 2-24 hours runtime.

I have to add that your settings didn't work for me completely maybe you have fiddled around with your machine prior, i had to add specifically the modprobe blacklist and the 7900 XTX card in order to not have an internal error on starting rocminfo or using the gpu.

GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt modprobe.blacklist=radeon,vfio-pci.ids=1002:744c,1002:ab30"

I've seen the error below in the Syslog, the error is so critical that proxmox thinks the machine is still running but in reality its stuck in internal error mode.

QEMU[3330]: error: kvm run failed Bad address
QEMU[3330]: RAX=0000000000002370 RBX=000000000000046f RCX=000300011dd5c073 RDX=000000000000046e
QEMU[3330]: RSI=ffffa5187eb02370 RDI=ffff9367dbc80000 RBP=ffffa51242077938 RSP=ffffa51242077938
QEMU[3330]: R8 =0003000000000073 R9 =ffffa5187eb00000 R10=0000000000000000 R11=0000000000000000
QEMU[3330]: R12=000000000000046f R13=ffff9367dbc80000 R14=ffff9367d1140ef0 R15=ffffa5187eb00000
QEMU[3330]: RIP=ffffffffc0801293 RFL=00000282 [--S----] CPL=0 II=0 A20=1 SMM=0 HLT=0
QEMU[3330]: ES =0000 0000000000000000 00000000 00000000
QEMU[3330]: CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
QEMU[3330]: SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
QEMU[3330]: DS =0000 0000000000000000 00000000 00000000
QEMU[3330]: FS =0000 00007297ba306780 00000000 00000000
QEMU[3330]: GS =0000 ffff9370dca80000 00000000 00000000
QEMU[3330]: LDT=0000 0000000000000000 00000000 00000000
QEMU[3330]: TR =0040 fffffe15407d4000 00004087 00008b00 DPL=0 TSS64-busy
QEMU[3330]: GDT= fffffe15407d2000 0000007f
QEMU[3330]: IDT= fffffe0000000000 00000fff
QEMU[3330]: CR0=80050033 CR2=00005f5b3cf1c9d8 CR3=0000000109fea000 CR4=00750ee0
QEMU[3330]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
QEMU[3330]: DR6=00000000ffff0ff0 DR7=0000000000000400
QEMU[3330]: EFER=0000000000200d01
QEMU[3330]: Code=55 48 21 c1 8d 04 d5 00 00 00 00 4c 09 c1 48 01 c6 48 89 e5 <48> 89 0e 31 c0 5d 31 d2 31 c9 31 f6 45 31 c0 e9 99 98 73 df 66 0f 1f 84 00 00 00 00 00 90
 
Did you get a bios rom file for you‘re specific card? It need‘s to be for you‘re exact model (vapor-x).

If you can‘t find one, you can dump it youreselve from you‘re card
 
Using this processor: model name : AMD Ryzen 9 7950X3D 16-Core Processor
PVE Version: pve-manager/8.1.4
Kernel: Linux 6.5.11-8-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-8 (2024-01-30T12:27Z)

Config file:
balloon: 0
boot:
cores: 4
cpu: host
cpulimit: 4
hostpci0: 0000:03:00,pcie=1,x-vga=1,romfile=xtx7900_1.rom
machine: q35
memory: 40000
meta: creation-qemu=8.1.5,ctime=1715199464
name: vm-ai-alcatros
net0: virtio=BC:24:11:84:BC:33,bridge=vmbr0,rate=70
numa: 0
ostype: l26
scsi0: containers:vm-251-disk-0,backup=0,size=300G
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=2117f17f-a803-499a-a559-3cbe56230430
sockets: 1
vga: std



Thanks for letting me know if something seems misconfigured
 
Which linux distro are you using? I'm personally working with the up-to-date fedora 40, it worked with 39 also without problems. Differences to my config are that I'm using OVMF (UEFI) BIOS and disabled vga after installation. I would recommend for you to try it out.

Make a fresh linux (fedora) install with UEFI BIOS and q35 platform, install everything with default vga device, after everything is complete change the vga device to none and add the passthrough (GUI and config file for rom).

The error you are experiencing is only after a few hours of running time?
 
I'm using Ubuntu 22.04 and yes it stops eventually after a random time between 2-24 hours but i can't find the real reason so far. Its in internal error state and even to reboot the full proxmox machine it will take quite some time because it still thinks the lock is required and tries to reboot the machine properly (which of course just waits for the timeout)...
 
I would try the following:
- Install Ubuntu directly wihtout PVE an you're machine. Try the same GPU load you use which gave the error on the proxmox vm. If you also get an error, than the problem isn't on the virtualization part, but on Ubuntu/Driver/Hardware site.
- If you don't have errors, install LACT ( https://github.com/ilya-zlobintsev/LACT/releases/ ) within you're Ubuntu Installation and dump you're current GPU BIOS (Menu Button -> Dump VBIOS)
- Install Proxmox new with my recommendations as well as a new ubuntu vm with UEFI BIOS and use you're previously dumped VBIOS. Try it out if it works now or if the error repeats itself.
 
The title should be change as it is not correct, the first post itself is bad as the uptodate version is prox 8.2, not 8.1. And new version at every 1-2 do give problem and change are made. writing partial guide to passtrouhg a card with random internet rom card on prox8.1 , can be a better match.
 
The title should be change as it is not correct, the first post itself is bad as the uptodate version is prox 8.2, not 8.1. And new version at every 1-2 do give problem and change are made. writing partial guide to passtrouhg a card with random internet rom card on prox8.1 , can be a better match.

I'm constantly updating my system and checking for functionality, until now it worked with every daily update up to 8.2.2. As soon as it's not working I will change the title. I will change the info on my first post though, you are right.
 
Can't seem to make it work on my set-up :(

This is what i see on my Vm running ubuntu 22.04.4 LTS:

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) (prog-if 00 [VGA controller])
Subsystem: Sapphire Technology Limited Radeon RX 570 Pulse 4GB
Physical Slot: 0
Flags: fast devsel, IRQ 16
Memory at 380000000000 (64-bit, prefetchable) [size=256M]
Memory at 380010000000 (64-bit, prefetchable) [size=2M]
I/O ports at 8000
Memory at 81000000 (32-bit, non-prefetchable) [size=256K][/SIZE]
Expansion ROM at 81080000 [disabled] [size=256K][/SIZE]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [58] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [200] Physical Resizable BAR
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] Page Request Interface (PRI)
Capabilities: [320] Latency Tolerance Reporting
Kernel driver in use: amdgpu
Kernel modules: amdgpu[/SIZE]
 
Last edited:
@w.reimer i've installed it bare metal and it seems to work without any interruption so far (which isn't great news) because this means the virtualization did something which made the vm halt in internal error mode.
 
Can't seem to make it work on my set-up :(

This is what i see on my Vm running ubuntu 22.04.4 LTS:

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) (prog-if 00 [VGA controller])
Subsystem: Sapphire Technology Limited Radeon RX 570 Pulse 4GB
Physical Slot: 0
Flags: fast devsel, IRQ 16
Memory at 380000000000 (64-bit, prefetchable) [size=256M]
Memory at 380010000000 (64-bit, prefetchable) [size=2M]
I/O ports at 8000
Memory at 81000000 (32-bit, non-prefetchable) [size=256K][/SIZE]
Expansion ROM at 81080000 [disabled] [size=256K][/SIZE]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [58] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [200] Physical Resizable BAR
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] Page Request Interface (PRI)
Capabilities: [320] Latency Tolerance Reporting
Kernel driver in use: amdgpu
Kernel modules: amdgpu[/SIZE]

Which Proxmox Version are you using? Did you follow my steps completely? What other Hardware are you using? (CPU/Mainboard) What kind of error are you experiencing?

@w.reimer i've installed it bare metal and it seems to work without any interruption so far (which isn't great news) because this means the virtualization did something which made the vm halt in internal error mode.

@Alcatros well, either way it's a step forward if we can exclude multiple error sources. I would recommend that you try my previous steps (dump the bios, create new proxmox installation with my steps in the first post, use q35 / UEFI BIOS VM, add you're gpu passthrough and ROM file after complete VM installation and remove the virtual gpu from the config)

What kind of workload are you running on you're vm at the time it crashes? If you don't mind telling
 
I have my brothers machine which runs proxmox and the same versions as listed but has the rombar=0 activated which runs now for 20 hours, i will keep on monitoring that progress and keep you updated.
The difference there is the i440fx which im using instead of the q35.

The barebone machine is running the full day now and we are running some llms on it and it works, i will keep you updated about that one too.
 
@w.reimer
code_language.shell:
 pveversion --verbose
proxmox-ve: 8.2.0 (running kernel: 6.8.4-3-pve)
pve-manager: 8.2.2 (running version: 8.2.2/9355359cd7afbae4)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.4-3
proxmox-kernel-6.8.4-3-pve-signed: 6.8.4-3
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.5: 6.5.13-5
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
ceph-fuse: 17.2.7-pve2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.6
libpve-cluster-perl: 8.0.6
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.1
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.2.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.2-1
proxmox-backup-file-restore: 3.2.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.6
pve-container: 5.1.10
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.0
pve-firewall: 5.0.7
pve-firmware: 3.11-1
pve-ha-manager: 4.0.4
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve2

AMD Rizen 1700x on a Asus ROG MB

From other threads it seems that maybe the 6.8 kernel is at fault. Reverting the kernel to 6.5 will probably bust proxmox so i try not to do anything. Still a noob when it comes to proxmox and linux.
 
Buildmeister i was using the said kernel version but had no success.... we shall see maybe someone who had success will post in the future
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!