Hello,
I'm currently testing different GPU cards on a Dell T7920 with quite good results until about a week ago. At the moment there are a GTX1080 and a RTX4000.
I'm using the pve-no-subscription repo and since the kernel has been upgraded from 5.13.19-6-pve (last fully working version for me) I get the following error when I try to boot the VM :
This only affects the RTX4000. It is no longer detected inside the VM while the GTX1080 is still working normally.
I know that the problem is due to the fact that the kernel has been updated because when I pin the latest functional version (without any other changes), it works normally again.
Here below some tests with the two next kernel versions released (5.15.35-1-pve and 5.15.30-2-pve).
Some configuration files :
(using 5.15.35-1-pve)
I know that using the pve-no-subscription repository is less tested and at our own risk, but I don't even know where I should start investigating to fix the problem.
At the moment I am continuing my tests on 5.13.19-6-pve but I would like to understand what is going on.
Any ideas ?
Many thanks !
I'm currently testing different GPU cards on a Dell T7920 with quite good results until about a week ago. At the moment there are a GTX1080 and a RTX4000.
I'm using the pve-no-subscription repo and since the kernel has been upgraded from 5.13.19-6-pve (last fully working version for me) I get the following error when I try to boot the VM :
kvm: -device vfio-pci,host=0000:73:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on: Failed to mmap 0000:73:00.0 BAR 3. Performance may be slow
This only affects the RTX4000. It is no longer detected inside the VM while the GTX1080 is still working normally.
I know that the problem is due to the fact that the kernel has been updated because when I pin the latest functional version (without any other changes), it works normally again.
Here below some tests with the two next kernel versions released (5.15.35-1-pve and 5.15.30-2-pve).
Bash:
root@projectvm:~# uname -a
Linux projectvm 5.15.35-1-pve #1 SMP PVE 5.15.35-2 (Thu, 05 May 2022 13:54:35 +0200) x86_64 GNU/Linux
root@projectvm:~# qm start 106
no efidisk configured! Using temporary efivars disk.
kvm: -device vfio-pci,host=0000:73:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on: Failed to mmap 0000:73:00.0 BAR 3. Performance may be slow
root@projectvm:~# proxmox-boot-tool kernel pin 5.15.30-3-pve
Overriding previously pinned version '5.15.35-1-pve' with '5.15.30-2-pve'
Set kernel '5.15.30-2-pve' in /etc/kernel/proxmox-boot-pin.
Refresh the actual boot ESPs now? [yN] y
Running hook script 'proxmox-auto-removal'..
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
Copying and configuring kernels on /dev/disk/by-uuid/6CA5-071E
Copying kernel and creating boot-entry for 5.13.19-6-pve
Copying kernel and creating boot-entry for 5.15.30-2-pve
Copying kernel and creating boot-entry for 5.15.35-1-pve
Copying and configuring kernels on /dev/disk/by-uuid/6CA5-51E5
Copying kernel and creating boot-entry for 5.13.19-6-pve
Copying kernel and creating boot-entry for 5.15.30-2-pve
Copying kernel and creating boot-entry for 5.15.35-1-pve
root@projectvm:~# reboot
root@projectvm:~# uname -a
Linux projectvm 5.15.30-2-pve #1 SMP PVE 5.15.30-3 (Fri, 22 Apr 2022 18:08:27 +0200) x86_64 GNU/Linux
root@projectvm:~# qm start 106
no efidisk configured! Using temporary efivars disk.
kvm: -device vfio-pci,host=0000:73:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on: Failed to mmap 0000:73:00.0 BAR 3. Performance may be slow
root@projectvm:~# proxmox-boot-tool kernel pin 5.13.19-6-pve
[...]
root@projectvm:~# reboot
root@projectvm:~# uname -a
Linux projectvm 5.13.19-6-pve #1 SMP PVE 5.13.19-15 (Tue, 29 Mar 2022 15:59:50 +0200) x86_64 GNU/Linux
root@projectvm:~# qm start 106
no efidisk configured! Using temporary efivars disk.
root@projectvm:~# <-- Working again !
Some configuration files :
Bash:
root@projectvm:~# cat /proc/cmdline
initrd=\EFI\proxmox\5.13.19-6-pve\initrd.img-5.13.19-6-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs video=efifb:off intel_iommu=on irqpoll
root@projectvm:~# cat /proc/cmdline
initrd=\EFI\proxmox\5.15.30-2-pve\initrd.img-5.15.30-2-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs video=efifb:off intel_iommu=on irqpoll
root@projectvm:~# cat /proc/cmdline
initrd=\EFI\proxmox\5.15.35-1-pve\initrd.img-5.15.35-1-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs video=efifb:off intel_iommu=on irqpoll
(using 5.15.35-1-pve)
Bash:
root@projectvm:~# dmesg | grep -e DMAR -e IOMMU
[ 0.020385] ACPI: DMAR 0x0000000069800DF8 000270 (v01 DELL\x CBX3 00000001 INTL 20091013)
[ 0.020470] ACPI: Reserving DMAR table memory at [mem 0x69800df8-0x69801067]
[ 0.160165] DMAR: IOMMU enabled
[ 0.376725] DMAR: Host address width 46
[ 0.376727] DMAR: DRHD base: 0x000000d37fc000 flags: 0x0
[ 0.376736] DMAR: dmar0: reg_base_addr d37fc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[ 0.376742] DMAR: DRHD base: 0x000000e0ffc000 flags: 0x0
[ 0.376748] DMAR: dmar1: reg_base_addr e0ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[ 0.376753] DMAR: DRHD base: 0x000000ee7fc000 flags: 0x0
[ 0.376758] DMAR: dmar2: reg_base_addr ee7fc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[ 0.376763] DMAR: DRHD base: 0x000000fbffc000 flags: 0x0
[ 0.376768] DMAR: dmar3: reg_base_addr fbffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[ 0.376772] DMAR: DRHD base: 0x000000a0ffc000 flags: 0x0
[ 0.376777] DMAR: dmar4: reg_base_addr a0ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[ 0.376781] DMAR: DRHD base: 0x000000a47fc000 flags: 0x0
[ 0.376791] DMAR: dmar5: reg_base_addr a47fc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[ 0.376796] DMAR: DRHD base: 0x000000c5ffc000 flags: 0x0
[ 0.376801] DMAR: dmar6: reg_base_addr c5ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[ 0.376805] DMAR: DRHD base: 0x000000a0bfc000 flags: 0x1
[ 0.376810] DMAR: dmar7: reg_base_addr a0bfc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[ 0.376814] DMAR: RMRR base: 0x0000006bc56000 end: 0x0000006bc66fff
[ 0.376818] DMAR: ATSR flags: 0x0
[ 0.376821] DMAR: ATSR flags: 0x0
[ 0.376823] DMAR: RHSA base: 0x000000a0bfc000 proximity domain: 0x0
[ 0.376827] DMAR: RHSA base: 0x000000a0ffc000 proximity domain: 0x0
[ 0.376829] DMAR: RHSA base: 0x000000a47fc000 proximity domain: 0x0
[ 0.376832] DMAR: RHSA base: 0x000000c5ffc000 proximity domain: 0x0
[ 0.376835] DMAR: RHSA base: 0x000000d37fc000 proximity domain: 0x1
[ 0.376837] DMAR: RHSA base: 0x000000e0ffc000 proximity domain: 0x1
[ 0.376840] DMAR: RHSA base: 0x000000ee7fc000 proximity domain: 0x1
[ 0.376842] DMAR: RHSA base: 0x000000fbffc000 proximity domain: 0x1
[ 0.376847] DMAR-IR: IOAPIC id 12 under DRHD base 0xc5ffc000 IOMMU 6
[ 0.376851] DMAR-IR: IOAPIC id 11 under DRHD base 0xa47fc000 IOMMU 5
[ 0.376854] DMAR-IR: IOAPIC id 10 under DRHD base 0xa0ffc000 IOMMU 4
[ 0.376857] DMAR-IR: IOAPIC id 18 under DRHD base 0xfbffc000 IOMMU 3
[ 0.376860] DMAR-IR: IOAPIC id 17 under DRHD base 0xee7fc000 IOMMU 2
[ 0.376863] DMAR-IR: IOAPIC id 16 under DRHD base 0xe0ffc000 IOMMU 1
[ 0.376866] DMAR-IR: IOAPIC id 15 under DRHD base 0xd37fc000 IOMMU 0
[ 0.376869] DMAR-IR: IOAPIC id 8 under DRHD base 0xa0bfc000 IOMMU 7
[ 0.376872] DMAR-IR: IOAPIC id 9 under DRHD base 0xa0bfc000 IOMMU 7
[ 0.376875] DMAR-IR: HPET id 0 under DRHD base 0xa0bfc000
[ 0.376879] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.379165] DMAR-IR: Enabled IRQ remapping in x2apic mode
[ 34.087511] DMAR: No SATC found
[ 34.087516] DMAR: dmar6: Using Queued invalidation
[ 34.087522] DMAR: dmar5: Using Queued invalidation
[ 34.087528] DMAR: dmar4: Using Queued invalidation
[ 34.087547] DMAR: dmar3: Using Queued invalidation
[ 34.087551] DMAR: dmar2: Using Queued invalidation
[ 34.087555] DMAR: dmar1: Using Queued invalidation
[ 34.087559] DMAR: dmar0: Using Queued invalidation
[ 34.087571] DMAR: dmar7: Using Queued invalidation
[ 34.180091] DMAR: Intel(R) Virtualization Technology for Directed I/O
[ 41.831310] snd_emu10k1 0000:d6:00.0: non-passthrough IOMMU detected, widening DMA allocations
Bash:
root@projectvm:~# cat /etc/modules
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
Bash:
root@projectvm:~# dmesg | grep 'remapping'
[ 0.376879] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.379165] DMAR-IR: Enabled IRQ remapping in x2apic mode
Bash:
root@projectvm:~# cat /etc/modprobe.d/*
blacklist nouveau
blacklist nvidia
options kvm ignore_msrs=1
blacklist nvidiafb
options vfio-pci ids=10de:1b80,10de:10f0,10de:1eb0,10de:10f8,10de:1ad8,10de:1ad9,10de:1eb1,10de:10f8,10de:1ad8,10de:1ad9,10de:1cb3,10de:0fb9,10de:107d,10de:0e08 disable_vga=1
I know that using the pve-no-subscription repository is less tested and at our own risk, but I don't even know where I should start investigating to fix the problem.
At the moment I am continuing my tests on 5.13.19-6-pve but I would like to understand what is going on.
Any ideas ?
Many thanks !
Last edited: