[SOLVED] GPU Passthrough with RTX 3090 Doesn't work & DMAR Errors

justingr1

New Member
Sep 20, 2022
6
1
3
Hello dear Proxmox Community! :)

I'm trying to set up GPU Passthrough with my Nvidia RTX 3090, but i can't get my card to work.

Current software version:

PVE: 7.3-4
Kernel Version : Linux 5.15.83-1

My hardware:

TUF GAMING B660M-PLUS D4 (Bios Updated yesterday including Intel ME.)
64GB of RAM
Intel i7-13700K

Bios settings:

IMMOU = Enabled
VT-D = Enabled
Intel Virtualization = Enabled
SR-IOV = Enabled
Above 4G Decoding = Enabled

My /etc/default/grub :

Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream initcall_blacklist=sysfb_init vfio_pci.ids=10de:2204,10de:1aef"

cat /proc/cmdline

Code:
BOOT_IMAGE=/boot/vmlinuz-5.15.83-1-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt pcie_acs_override=downstream initcall_blacklist=sysfb_init vfio_pci.ids=10de:2204,10de:1aef intel_iommu=on

The /etc/modprobe.d/blacklist.conf :

Code:
blacklist radeon
blacklist nouveau
blacklist nvidia
blacklist snd_hda_intel

The /etc/modules :

Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

I've got some DMAR Errors: dmesg | grep -e DMAR -e IOMMU

Code:
[    0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[    0.005879] ACPI: DMAR 0x0000000071FA1000 000088 (v02 INTEL  EDK2     00000002      01000013)
[    0.005913] ACPI: Reserving DMAR table memory at [mem 0x71fa1000-0x71fa1087]
[    0.085759] DMAR: IOMMU enabled
[    0.085783] DMAR: IOMMU enabled
[    0.207224] DMAR: Host address width 39
[    0.207225] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[    0.207227] DMAR: dmar0: reg_base_addr fed90000 ver 4:0 cap 1c0000c40660462 ecap 29a00f0505e
[    0.207229] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[    0.207230] DMAR: dmar1: reg_base_addr fed91000 ver 5:0 cap d2008c40660462 ecap f050da
[    0.207233] DMAR: RMRR base: 0x0000007c000000 end: 0x000000807fffff
[    0.207234] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 1
[    0.207235] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[    0.207235] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.208780] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    0.372916] pci 0000:00:02.0: DMAR: Skip IOMMU disabling for graphics
[    0.423115] DMAR: No ATSR found
[    0.423115] DMAR: No SATC found
[    0.423116] DMAR: IOMMU feature fl1gp_support inconsistent
[    0.423117] DMAR: IOMMU feature pgsel_inv inconsistent
[    0.423118] DMAR: IOMMU feature nwfs inconsistent
[    0.423118] DMAR: IOMMU feature dit inconsistent
[    0.423118] DMAR: IOMMU feature sc_support inconsistent
[    0.423119] DMAR: IOMMU feature dev_iotlb_support inconsistent
[    0.423119] DMAR: dmar0: Using Queued invalidation
[    0.423121] DMAR: dmar1: Using Queued invalidation
[    0.423879] DMAR: Intel(R) Virtualization Technology for Directed I/O

Iommu is activated:

Code:
IOMMU group 0 00:00.0 Host bridge [0600]: Intel Corporation Device [8086:a703] (rev 01)
IOMMU group 10 00:1a.0 PCI bridge [0604]: Intel Corporation Device [8086:7ac8] (rev 11)
IOMMU group 11 00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:7ab8] (rev 11)
IOMMU group 12 00:1c.2 PCI bridge [0604]: Intel Corporation Device [8086:7aba] (rev 11)
IOMMU group 13 00:1c.4 PCI bridge [0604]: Intel Corporation Device [8086:7abc] (rev 11)
IOMMU group 14 00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:7a86] (rev 11)
IOMMU group 14 00:1f.3 Audio device [0403]: Intel Corporation Device [8086:7ad0] (rev 11)
IOMMU group 14 00:1f.4 SMBus [0c05]: Intel Corporation Device [8086:7aa3] (rev 11)
IOMMU group 14 00:1f.5 Serial bus controller [0c80]: Intel Corporation Device [8086:7aa4] (rev 11)
IOMMU group 15 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102 [GeForce RTX 3090] [10de:2204] (rev a1)
IOMMU group 15 01:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
IOMMU group 16 02:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/980PRO [144d:a80a]
IOMMU group 17 04:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller [1b21:2142]
IOMMU group 18 05:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05)
IOMMU group 19 06:00.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1806] (rev 01)
IOMMU group 1 00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:a70d] (rev 01)
IOMMU group 20 07:00.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1806] (rev 01)
IOMMU group 21 07:02.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1806] (rev 01)
IOMMU group 22 07:06.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1806] (rev 01)
IOMMU group 23 07:0e.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1806] (rev 01)
IOMMU group 24 08:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 04)
IOMMU group 25 09:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 04)
IOMMU group 2 00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:a780] (rev 04)
IOMMU group 3 00:06.0 PCI bridge [0604]: Intel Corporation Device [8086:a74d] (rev 01)
IOMMU group 4 00:0a.0 Signal processing controller [1180]: Intel Corporation Device [8086:a77d] (rev 01)
IOMMU group 5 00:0e.0 RAID bus controller [0104]: Intel Corporation Device [8086:a77f]
IOMMU group 6 00:14.0 USB controller [0c03]: Intel Corporation Device [8086:7ae0] (rev 11)
IOMMU group 6 00:14.2 RAM memory [0500]: Intel Corporation Device [8086:7aa7] (rev 11)
IOMMU group 7 00:15.0 Serial bus controller [0c80]: Intel Corporation Device [8086:7acc] (rev 11)
IOMMU group 7 00:15.1 Serial bus controller [0c80]: Intel Corporation Device [8086:7acd] (rev 11)
IOMMU group 7 00:15.2 Serial bus controller [0c80]: Intel Corporation Device [8086:7ace] (rev 11)
IOMMU group 8 00:16.0 Communication controller [0780]: Intel Corporation Device [8086:7ae8] (rev 11)
IOMMU group 9 00:17.0 SATA controller [0106]: Intel Corporation Device [8086:7ae2] (rev 11)

My /etc/modprobe.d/kvm.conf :

Code:
ns kvm ignore_msrs=1

My VM Conf:

agent: 1
args: -cpu host,hv_vapic,+invtsc,-hypervisor
balloon: 0
bios: ovmf
boot: order=sata0;net0
cores: 24
cpu: host,hidden=1,flags=+pcid;+spec-ctrl;+ssbd;+hv-evmcs;+aes
cpuunits: 4500
efidisk0: POOL1:vm-300-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hookscript: local:snippets/gpu-hookscript.sh
hostpci0: 0000:01:00,pcie=1,x-vga=on,romfile=Gigabyte.RTX3090.24576.200904.rom
hotplug: disk,network,usb,cpu
kvm: 1
machine: pc-q35-7.1
memory: 60000
meta: creation-qemu=7.1.0,ctime=1672163581
name: JITS08-GS
net0: e1000=E4:70:B8:00:B7:77,bridge=vmbr1,firewall=1
numa: 1
onboot: 1
ostype: win11
sata0: POOL1:vm-300-disk-1,cache=writethrough,discard=on,size=160G,ssd=1
sata1:pOOL1:vm-300-disk-3,cache=writethrough,discard=on,size=6600G,ssd=1
scsihw: lsi
smbios1: uuid=b3813edc-4608-485c-a49d-68f352e53df1,manufacturer=QVNVU1RlSyBDT01QVVRFUiBJTkMu,product=VFVGIEdBTUlORyBCNjYwTS1QTFVTIEQ0,version=UmV2IDEueHg=,serial=MjExMTk0MDQwNjAwMTkw,sku=QUxQSEEtSklUUzA4LUdT,family=QUxQSEEgVEVSTUlOQUwgU0VSVkVS,base64=1
sockets: 1
tablet: 1
tpmstate0:pOOL1:vm-300-disk-2,size=4M,version=v2.0
vcpus: 24
vga: none
vmgenid: f94603af-eeaf-424c-b565-2e581f8525b0


The Machine starts without errors on the gui, but i can not ping the machine.
After i started the VM, the RAM Usage rises up to around 90%.

i already tried and tested many workarounds but nothing seems to help, maybe someone skilled has an idea?

I also tested a Hookscript from the forum:

#!/bin/bash

if [ $2 == "pre-start" ]
then
echo "gpu-hookscript: Resetting GPU for Virtual Machine $1"
echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove
echo 1 > /sys/bus/pci/rescan
fi
 
Last edited:
Resolved:

If anyone ever is facing this error, i had to disable Resize-Bar in the Bios.

I tested it successfuly with 2 different Proxmox Hosts, both Asus MoBo Systems, one is TUF series and the other one Rog Strix series and 3x different Nvidia cards (Gigabyte RTX3090, Quadro M4000, Quadro P2000).

If i can someone help to get out of this rabbit hole, please try this manual: https://hackmd.io/@edingroot/SkGD3Q7Wv

And double check your Bios settings.
 
  • Like
Reactions: leesteken
Hey! So combining some info I got here as well as another forum, i got the proxmox to passthrough my gtx3090 perfectly(with screen output as well)

I followed this tutorial:
https://www.reddit.com/r/homelab/comments/b5xpua/the_ultimate_beginners_guide_to_gpu_passthrough/

However it did not cover blocking your gpu devices from being completely blocked by the grub bootloader so, I had to take these steps:



You get the info for the manufacturer (vfio-pci-id's) like this

lspci -nn | grep -i nvidia

then take the info and apply it to the grub file:

nano /etc/default/grub

Code:
"quiet intel_iommu=on iommu=pt nomodeset initcall_blacklist=sysfb_init vfio-pci.ids=10de:2204,10de:1aef disable_vga=1"

In the end it should look similar to this:

Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset initcall_blacklist=sysfb_init video=vesafbff,efifbff vfio-pci.ids=10de:2204,10de:1aef disable_vga=1"

then update grub
update-grub

reboot the system:
reboot

shout out for @leesteken for calling out my mistakes, I edited and this config works fine and the other vm's seem a bit more stable.
 
Last edited:
  • Like
Reactions: psyyo
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset initcall_blacklist=sysfb_init video=vesafbff,efifbff vfio-pci.ids=10de:2204,10de:1aef disable_vga=1"
video=vesafbff,efifbff does nothing because it is wrong in several ways (and ignored).
nofb, nomodeset and iommu=pt probably do nothing. Please let me know if one of them makes a difference for anything.
pcie_acs_override=downstream,multifunction is often not needed and breaks the security isolation between VMs and between VMs and the host. Use with caution.
 
  • Like
Reactions: kryptstar
video=vesafbff,efifbff does nothing because it is wrong in several ways (and ignored).
nofb, nomodeset and iommu=pt probably do nothing. Please let me know if one of them makes a difference for anything.
pcie_acs_override=downstream,multifunction is often not needed and breaks the security isolation between VMs and between VMs and the host. Use with caution.
Thank you! I confirmed your comment, I edited my original config above by taking out the commands that weren't necessary and the PCI passthrough worked fine.
 
I CAN confirm that this works (with a RTX 3090).

My system is an Intel Core i7-6700K on an Asus Z170-E motherboard with 64 GB of DDR4-2400 RAM.

Thank you!!!

(It's been quite a struggle to get GPU passthrough working for me.)

*edit*
Unfortunately, this process does NOT work with the Nvidia GTX 980. That is STILL giving me a Windows Error code 43.

Sidebar#1: Presumably then, for the gpu-hookscript.sh, if your device ID isn't 01:00.0, then you will probably need to change that in said gpu-hookscript.sh. Just a FYI.

Sidebar #2: If you intend on using this in conjunction with virtio-fs, you will likely have to edit the Perl virtio-fs.pl hookscript such that the "pre-start" section will look something like this:

Code:
if ($phase eq 'pre-start') {
# First phase 'pre-start' will be executed before the guest
    # ist started. Exiting with a code != 0 will abort the start
print "$vmid is starting, doing preparations.\n";
system('echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove');
system('echo 1 > /sys/bus/pci/rescan');
system('/var/lib/vz/snippets/launch-virtio-daemon.sh');
# print "preparations failed, aborting."
    # exit(1);
 
Last edited:
This solution (disabling ReBAR) solved my issue - thank you for following up with additional information!

My config also involves an ASUS board - the W680-ACE, paired with an RTX 3090. I had read another potential solution was to copy out the VBIOS and pass it manually, but I was getting I/O errors trying to cat the VBIOS/romfile, so I never confirmed whether that works. My kernel cmdline options are pretty minimal and indeed do not require the extra ones mentioned above.

I have to admit there is some BIOS jankiness on this board. At least it has a lot of options, although some of them seem quite broken, like "pre-boot IOMMU" causing insane fragmentation of the memory space (to the extent that not even a clean W11 install can boot).
 
This solution (disabling ReBAR) solved my issue - thank you for following up with additional information!

My config also involves an ASUS board - the W680-ACE, paired with an RTX 3090. I had read another potential solution was to copy out the VBIOS and pass it manually, but I was getting I/O errors trying to cat the VBIOS/romfile, so I never confirmed whether that works. My kernel cmdline options are pretty minimal and indeed do not require the extra ones mentioned above.

I have to admit there is some BIOS jankiness on this board. At least it has a lot of options, although some of them seem quite broken, like "pre-boot IOMMU" causing insane fragmentation of the memory space (to the extent that not even a clean W11 install can boot).
Out of an abundance of caution, I downloaded the VBIOS from TechPowerUp's GPU BIOS database.

So, I'm not sure if that step was absolutely required, but so far, as far as I can tell, it doesn't necessarily appear to cause a problem by doing so.