GPU PCI passthrough for nvidia RTX 4070 on intel i7 13700

murgo

New Member
Mar 5, 2024
2
0
1
Dear all, I recently bought a ZOTAC MagnusOne with a 13th Gen Intel(R) Core(TM) i7-13700 (1 Socket) and an NVIDIA RTX 4070 card.
I decided to give Proxmox a try, so I installed Proxmox VE 8.1 with kernel version Linux 6.5.13-1-pve and tried GPU PCI passthrough.

I have been following the procedure in the official documentation for the current version of the proxmox VE (https://pve.proxmox.com/pve-docs/pve-admin-guide.html#qm_pci_passthrough) and did the following.

1 I modified the /etc/default/grub to have the following line:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt initcall_blacklist=sysfb_init”

2 I modified the /etc/modules file to have the following:
Code:
vfio   
vfio_iommu_type1    
vfio_pci

3 I updated the system and rebooted it.
update-initramfs -u -k all
reboot

4 I checked whether everything was ok with the command lsmod | grep vfio, and I got this:

vfio_pci 16384 0
vfio_pci_core 86016 1 vfio_pci
irqbypass 12288 2 vfio_pci_core,kvm
vfio_iommu_type1 49152 0
vfio 57344 3 vfio_pci_core,vfio_iommu_type1,vfio_pci
iommufd 77824 1 vfio

5 Then I used the command dmesg | grep -e DMAR -e IOMMU -e AMD-Vi, and I got this result:

[ 0.012839] ACPI: DMAR 0x0000000044CA1000 000088 (v02 INTEL EDK2 00000002 01000013)
[ 0.012864] ACPI: Reserving DMAR table memory at [mem 0x44ca1000-0x44ca1087]
[ 0.155636] DMAR: Host address width 39
[ 0.155637] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.155640] DMAR: dmar0: reg_base_addr fed90000 ver 4:0 cap 1c0000c40660462 ecap 29a00f0505e
[ 0.155641] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.155645] DMAR: dmar1: reg_base_addr fed91000 ver 5:0 cap d2008c40660462 ecap f050da
[ 0.155646] DMAR: RMRR base: 0x0000004e000000 end: 0x000000523fffff
[ 0.155648] DMAR-IR: IOAPIC id 2 under DRHD base 0xfed91000 IOMMU 1
[ 0.155648] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[ 0.155649] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.156512] DMAR-IR: Enabled IRQ remapping in x2apic mode
[ 0.340822] pci 0000:00:02.0: DMAR: Skip IOMMU disabling for graphics
[ 0.406829] DMAR: Intel-IOMMU force enabled due to platform opt in
[ 0.406834] DMAR: No ATSR found
[ 0.406834] DMAR: No SATC found
[ 0.406835] DMAR: IOMMU feature fl1gp_support inconsistent
[ 0.406835] DMAR: IOMMU feature pgsel_inv inconsistent
[ 0.406835] DMAR: IOMMU feature nwfs inconsistent
[ 0.406836] DMAR: IOMMU feature dit inconsistent
[ 0.406836] DMAR: IOMMU feature sc_support inconsistent
[ 0.406836] DMAR: IOMMU feature dev_iotlb_support inconsistent
[ 0.406837] DMAR: dmar0: Using Queued invalidation
[ 0.406838] DMAR: dmar1: Using Queued invalidation
[ 0.408023] DMAR: Intel(R) Virtualization Technology for Directed I/O

The documentation says that I should better check if it DMAR IOMMU is enabled, then I run the command grep ‘remapping’ and obtained:

[ 0.155649] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.156512] DMAR-IR: Enabled IRQ remapping in x2apic mode

Thus, it seems that DMAR IOMMU is not enabled, thus I added this

echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf

Hence, I run pvesh get /nodes/{nodename}/hardware/pci --pci-class-blacklist “" to check whether they are in separate IOMMU groups and got:

0x030000 │ 0x2786 │ 0000:01:00.0 │ 17 │ 0x10de │ AD104 [GeForce RTX 4070] │ │ 0x2714 │
0x040300 │ 0x22bc │ 0000:01:00.1 │ 17 │ 0x10de │ │ │ 0x2714 │


6 Now I have to blacklist the drivers. Thus I am doing the following:

Code:
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
echo "blacklist nvidia*" >> /etc/modprobe.d/blacklist.conf
and reboot

Then I use the command lspci -k | grep -A 3 "VGA” and I got this:

01:00.0 VGA compatible controller: NVIDIA Corporation AD104 [GeForce RTX 4070] (rev a1)
Subsystem: ZOTAC International (MCO) Ltd. AD104 [GeForce RTX 4070]
Kernel modules: nvidiafb, nouveau
01:00.1 Audio device: NVIDIA Corporation Device 22bc (rev a1)

7. It seems that the blacklisting of drivers is not working. Hence, grounding on the official documentation, I checked the ROM using the rom parser and I got this:

Valid ROM signature found @0h, PCIR offset 170h
PCIR: type 0 (x86 PC-AT), vendor: 10de, device: 2786, class: 030000
PCIR: revision 0, vendor revision: 1
Valid ROM signature found @fc00h, PCIR offset 1ch
PCIR: type 3 (EFI), vendor: 10de, device: 2786, class: 000000
PCIR: revision 3, vendor revision: 0
EFI: Signature Valid, Subsystem: Boot, Machine: X64
Last image

Then I returned and used the command lspci -k | grep -A 3 “VGA” , but I still got this.

01:00.0 VGA compatible controller: NVIDIA Corporation AD104 [GeForce RTX 4070] (rev a1)
Subsystem: ZOTAC International (MCO) Ltd. AD104 [GeForce RTX 4070]
Kernel modules: nvidiafb, nouveau
01:00.1 Audio device: NVIDIA Corporation Device 22bc (rev a1)


8 Now, the documentation says that I should add this line options vfio-pci ids=Id:vendor, id:vendor to a .conf file in the folder /etc/modprob.d/

But If I do it, the system crashes at boot, and there are no other things to do besides reinstalling Proxmox.

9 Regardless of these results, I tried to create a VM with the PCI device attached, then I got the error message;
stopped: unable to read tail (got 0 bytes)

At this point, I don’t know what to do.
I would appreciate it if anybody could provide some help.

Thank you in advance
m
 
Last edited:
Dear all, here a short update.
It seems I partially identified the problem.
When running the command update-initramfs -u -k all , I was getting this error:
Bash:
update-initramfs: Generating /boot/initrd.img-6.5.13-1-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-6.5.11-8-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.

To solve this, I followed the procedure suggested here, which involves using the ext4 filesystem and GRUB.
This allowed me to enable DMAR IOMMU.
Nevertheless, the passthrough of the GPU was still not working, with the same error as before: stopped: unable to read tail (got 0 bytes).

Hence, I connected a different GPU using an external box with Thunderbolt, and everything worked with the new GPU (an NVIDIA RTX 2080 Ti).

Thus, the problem seems to be now due to the NVIDIA RTX 4070.
Any similar experience?

I really appreciate any help you can provide.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!