[SOLVED] GPU Passthrough with RTX 3090 Doesn't work & DMAR Errors

justingr1

New Member
Sep 20, 2022
6
1
3
Hello dear Proxmox Community! :)

I'm trying to set up GPU Passthrough with my Nvidia RTX 3090, but i can't get my card to work.

Current software version:

PVE: 7.3-4
Kernel Version : Linux 5.15.83-1

My hardware:

TUF GAMING B660M-PLUS D4 (Bios Updated yesterday including Intel ME.)
64GB of RAM
Intel i7-13700K

Bios settings:

IMMOU = Enabled
VT-D = Enabled
Intel Virtualization = Enabled
SR-IOV = Enabled
Above 4G Decoding = Enabled

My /etc/default/grub :

Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream initcall_blacklist=sysfb_init vfio_pci.ids=10de:2204,10de:1aef"

cat /proc/cmdline

Code:
BOOT_IMAGE=/boot/vmlinuz-5.15.83-1-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt pcie_acs_override=downstream initcall_blacklist=sysfb_init vfio_pci.ids=10de:2204,10de:1aef intel_iommu=on

The /etc/modprobe.d/blacklist.conf :

Code:
blacklist radeon
blacklist nouveau
blacklist nvidia
blacklist snd_hda_intel

The /etc/modules :

Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

I've got some DMAR Errors: dmesg | grep -e DMAR -e IOMMU

Code:
[    0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[    0.005879] ACPI: DMAR 0x0000000071FA1000 000088 (v02 INTEL  EDK2     00000002      01000013)
[    0.005913] ACPI: Reserving DMAR table memory at [mem 0x71fa1000-0x71fa1087]
[    0.085759] DMAR: IOMMU enabled
[    0.085783] DMAR: IOMMU enabled
[    0.207224] DMAR: Host address width 39
[    0.207225] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[    0.207227] DMAR: dmar0: reg_base_addr fed90000 ver 4:0 cap 1c0000c40660462 ecap 29a00f0505e
[    0.207229] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[    0.207230] DMAR: dmar1: reg_base_addr fed91000 ver 5:0 cap d2008c40660462 ecap f050da
[    0.207233] DMAR: RMRR base: 0x0000007c000000 end: 0x000000807fffff
[    0.207234] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 1
[    0.207235] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[    0.207235] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.208780] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    0.372916] pci 0000:00:02.0: DMAR: Skip IOMMU disabling for graphics
[    0.423115] DMAR: No ATSR found
[    0.423115] DMAR: No SATC found
[    0.423116] DMAR: IOMMU feature fl1gp_support inconsistent
[    0.423117] DMAR: IOMMU feature pgsel_inv inconsistent
[    0.423118] DMAR: IOMMU feature nwfs inconsistent
[    0.423118] DMAR: IOMMU feature dit inconsistent
[    0.423118] DMAR: IOMMU feature sc_support inconsistent
[    0.423119] DMAR: IOMMU feature dev_iotlb_support inconsistent
[    0.423119] DMAR: dmar0: Using Queued invalidation
[    0.423121] DMAR: dmar1: Using Queued invalidation
[    0.423879] DMAR: Intel(R) Virtualization Technology for Directed I/O

Iommu is activated:

Code:
IOMMU group 0 00:00.0 Host bridge [0600]: Intel Corporation Device [8086:a703] (rev 01)
IOMMU group 10 00:1a.0 PCI bridge [0604]: Intel Corporation Device [8086:7ac8] (rev 11)
IOMMU group 11 00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:7ab8] (rev 11)
IOMMU group 12 00:1c.2 PCI bridge [0604]: Intel Corporation Device [8086:7aba] (rev 11)
IOMMU group 13 00:1c.4 PCI bridge [0604]: Intel Corporation Device [8086:7abc] (rev 11)
IOMMU group 14 00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:7a86] (rev 11)
IOMMU group 14 00:1f.3 Audio device [0403]: Intel Corporation Device [8086:7ad0] (rev 11)
IOMMU group 14 00:1f.4 SMBus [0c05]: Intel Corporation Device [8086:7aa3] (rev 11)
IOMMU group 14 00:1f.5 Serial bus controller [0c80]: Intel Corporation Device [8086:7aa4] (rev 11)
IOMMU group 15 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102 [GeForce RTX 3090] [10de:2204] (rev a1)
IOMMU group 15 01:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
IOMMU group 16 02:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/980PRO [144d:a80a]
IOMMU group 17 04:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller [1b21:2142]
IOMMU group 18 05:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05)
IOMMU group 19 06:00.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1806] (rev 01)
IOMMU group 1 00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:a70d] (rev 01)
IOMMU group 20 07:00.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1806] (rev 01)
IOMMU group 21 07:02.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1806] (rev 01)
IOMMU group 22 07:06.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1806] (rev 01)
IOMMU group 23 07:0e.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1806] (rev 01)
IOMMU group 24 08:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 04)
IOMMU group 25 09:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 04)
IOMMU group 2 00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:a780] (rev 04)
IOMMU group 3 00:06.0 PCI bridge [0604]: Intel Corporation Device [8086:a74d] (rev 01)
IOMMU group 4 00:0a.0 Signal processing controller [1180]: Intel Corporation Device [8086:a77d] (rev 01)
IOMMU group 5 00:0e.0 RAID bus controller [0104]: Intel Corporation Device [8086:a77f]
IOMMU group 6 00:14.0 USB controller [0c03]: Intel Corporation Device [8086:7ae0] (rev 11)
IOMMU group 6 00:14.2 RAM memory [0500]: Intel Corporation Device [8086:7aa7] (rev 11)
IOMMU group 7 00:15.0 Serial bus controller [0c80]: Intel Corporation Device [8086:7acc] (rev 11)
IOMMU group 7 00:15.1 Serial bus controller [0c80]: Intel Corporation Device [8086:7acd] (rev 11)
IOMMU group 7 00:15.2 Serial bus controller [0c80]: Intel Corporation Device [8086:7ace] (rev 11)
IOMMU group 8 00:16.0 Communication controller [0780]: Intel Corporation Device [8086:7ae8] (rev 11)
IOMMU group 9 00:17.0 SATA controller [0106]: Intel Corporation Device [8086:7ae2] (rev 11)

My /etc/modprobe.d/kvm.conf :

Code:
ns kvm ignore_msrs=1

My VM Conf:

agent: 1
args: -cpu host,hv_vapic,+invtsc,-hypervisor
balloon: 0
bios: ovmf
boot: order=sata0;net0
cores: 24
cpu: host,hidden=1,flags=+pcid;+spec-ctrl;+ssbd;+hv-evmcs;+aes
cpuunits: 4500
efidisk0: POOL1:vm-300-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hookscript: local:snippets/gpu-hookscript.sh
hostpci0: 0000:01:00,pcie=1,x-vga=on,romfile=Gigabyte.RTX3090.24576.200904.rom
hotplug: disk,network,usb,cpu
kvm: 1
machine: pc-q35-7.1
memory: 60000
meta: creation-qemu=7.1.0,ctime=1672163581
name: JITS08-GS
net0: e1000=E4:70:B8:00:B7:77,bridge=vmbr1,firewall=1
numa: 1
onboot: 1
ostype: win11
sata0: POOL1:vm-300-disk-1,cache=writethrough,discard=on,size=160G,ssd=1
sata1:pOOL1:vm-300-disk-3,cache=writethrough,discard=on,size=6600G,ssd=1
scsihw: lsi
smbios1: uuid=b3813edc-4608-485c-a49d-68f352e53df1,manufacturer=QVNVU1RlSyBDT01QVVRFUiBJTkMu,product=VFVGIEdBTUlORyBCNjYwTS1QTFVTIEQ0,version=UmV2IDEueHg=,serial=MjExMTk0MDQwNjAwMTkw,sku=QUxQSEEtSklUUzA4LUdT,family=QUxQSEEgVEVSTUlOQUwgU0VSVkVS,base64=1
sockets: 1
tablet: 1
tpmstate0:pOOL1:vm-300-disk-2,size=4M,version=v2.0
vcpus: 24
vga: none
vmgenid: f94603af-eeaf-424c-b565-2e581f8525b0


The Machine starts without errors on the gui, but i can not ping the machine.
After i started the VM, the RAM Usage rises up to around 90%.

i already tried and tested many workarounds but nothing seems to help, maybe someone skilled has an idea?

I also tested a Hookscript from the forum:

#!/bin/bash

if [ $2 == "pre-start" ]
then
echo "gpu-hookscript: Resetting GPU for Virtual Machine $1"
echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove
echo 1 > /sys/bus/pci/rescan
fi
 
Last edited:
Resolved:

If anyone ever is facing this error, i had to disable Resize-Bar in the Bios.

I tested it successfuly with 2 different Proxmox Hosts, both Asus MoBo Systems, one is TUF series and the other one Rog Strix series and 3x different Nvidia cards (Gigabyte RTX3090, Quadro M4000, Quadro P2000).

If i can someone help to get out of this rabbit hole, please try this manual: https://hackmd.io/@edingroot/SkGD3Q7Wv

And double check your Bios settings.
 
  • Like
Reactions: leesteken
Hey! So combining some info I got here as well as another forum, i got the proxmox to passthrough my gtx3090 perfectly(with screen output as well)

I followed this tutorial:
https://www.reddit.com/r/homelab/comments/b5xpua/the_ultimate_beginners_guide_to_gpu_passthrough/

However it did not cover blocking your gpu devices from being completely blocked by the grub bootloader so, I had to take these steps:



You get the info for the manufacturer (vfio-pci-id's) like this

lspci -nn | grep -i nvidia

then take the info and apply it to the grub file:

nano /etc/default/grub

Code:
"quiet intel_iommu=on iommu=pt nomodeset initcall_blacklist=sysfb_init vfio-pci.ids=10de:2204,10de:1aef disable_vga=1"

In the end it should look similar to this:

Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset initcall_blacklist=sysfb_init video=vesafbff,efifbff vfio-pci.ids=10de:2204,10de:1aef disable_vga=1"

then update grub
update-grub

reboot the system:
reboot

shout out for @leesteken for calling out my mistakes, I edited and this config works fine and the other vm's seem a bit more stable.
 
Last edited:
  • Like
Reactions: psyyo
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset initcall_blacklist=sysfb_init video=vesafbff,efifbff vfio-pci.ids=10de:2204,10de:1aef disable_vga=1"
video=vesafbff,efifbff does nothing because it is wrong in several ways (and ignored).
nofb, nomodeset and iommu=pt probably do nothing. Please let me know if one of them makes a difference for anything.
pcie_acs_override=downstream,multifunction is often not needed and breaks the security isolation between VMs and between VMs and the host. Use with caution.
 
  • Like
Reactions: kryptstar
video=vesafbff,efifbff does nothing because it is wrong in several ways (and ignored).
nofb, nomodeset and iommu=pt probably do nothing. Please let me know if one of them makes a difference for anything.
pcie_acs_override=downstream,multifunction is often not needed and breaks the security isolation between VMs and between VMs and the host. Use with caution.
Thank you! I confirmed your comment, I edited my original config above by taking out the commands that weren't necessary and the PCI passthrough worked fine.
 
I CAN confirm that this works (with a RTX 3090).

My system is an Intel Core i7-6700K on an Asus Z170-E motherboard with 64 GB of DDR4-2400 RAM.

Thank you!!!

(It's been quite a struggle to get GPU passthrough working for me.)

*edit*
Unfortunately, this process does NOT work with the Nvidia GTX 980. That is STILL giving me a Windows Error code 43.

Sidebar#1: Presumably then, for the gpu-hookscript.sh, if your device ID isn't 01:00.0, then you will probably need to change that in said gpu-hookscript.sh. Just a FYI.

Sidebar #2: If you intend on using this in conjunction with virtio-fs, you will likely have to edit the Perl virtio-fs.pl hookscript such that the "pre-start" section will look something like this:

Code:
if ($phase eq 'pre-start') {
# First phase 'pre-start' will be executed before the guest
    # ist started. Exiting with a code != 0 will abort the start
print "$vmid is starting, doing preparations.\n";
system('echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove');
system('echo 1 > /sys/bus/pci/rescan');
system('/var/lib/vz/snippets/launch-virtio-daemon.sh');
# print "preparations failed, aborting."
    # exit(1);
 
Last edited:
This solution (disabling ReBAR) solved my issue - thank you for following up with additional information!

My config also involves an ASUS board - the W680-ACE, paired with an RTX 3090. I had read another potential solution was to copy out the VBIOS and pass it manually, but I was getting I/O errors trying to cat the VBIOS/romfile, so I never confirmed whether that works. My kernel cmdline options are pretty minimal and indeed do not require the extra ones mentioned above.

I have to admit there is some BIOS jankiness on this board. At least it has a lot of options, although some of them seem quite broken, like "pre-boot IOMMU" causing insane fragmentation of the memory space (to the extent that not even a clean W11 install can boot).
 
This solution (disabling ReBAR) solved my issue - thank you for following up with additional information!

My config also involves an ASUS board - the W680-ACE, paired with an RTX 3090. I had read another potential solution was to copy out the VBIOS and pass it manually, but I was getting I/O errors trying to cat the VBIOS/romfile, so I never confirmed whether that works. My kernel cmdline options are pretty minimal and indeed do not require the extra ones mentioned above.

I have to admit there is some BIOS jankiness on this board. At least it has a lot of options, although some of them seem quite broken, like "pre-boot IOMMU" causing insane fragmentation of the memory space (to the extent that not even a clean W11 install can boot).
Out of an abundance of caution, I downloaded the VBIOS from TechPowerUp's GPU BIOS database.

So, I'm not sure if that step was absolutely required, but so far, as far as I can tell, it doesn't necessarily appear to cause a problem by doing so.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!